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Abstract 

Background: Penstemon's unique phenotypic diversity, hardiness, and drought-tolerance give it great potential for 
the xeric landscaping industry. Molecular markers will accelerate the breeding and domestication of drought 
tolerant Penstemon cultivars by, creating genetic maps, and clarifying of phylogenetic relationships. Our objectives 
were to identify and validate interspecific molecular markers from four diverse Penstemon species in order to gain 
specific insights into the Penstemon genome. 

Results: We used a 454 pyrosequencing and GR-RSC (genome reduction using restriction site conservation) to identify 
homologous loci across four Penstemon species {P. cyananthus, P. davidsonii, P. dissectus, and P. fruticosus) representing 
three diverse subgenera with considerable genome size variation. From these genomic data, we identified 133 unique 
interspecific markers containing SSRs and INDELs of which 51 produced viable PCR-based markers. These markers 
produced simple banding patterns in 90% of the species x marker interactions (-84% were polymorphic). Twelve of 
the markers were tested across 93, mostly xeric, Penstemon taxa (72 species), of which -98% produced reproducible 
marker data. Additionally, we identified an average of one SNP per 2,890 bp per species and one per 97 bp between 
any two apparent homologous sequences from the four source species. We selected 192 homologous sequences, 
meeting stringent parameters, to create SNP markers. Of these, 75 demonstrated repeatable polymorphic marker 
functionality across the four sequence source species. Finally, sequence analysis indicated that repetitive elements were 
approximately 70% more prevalent in the P. cyananthus genome, the largest genome in the study, than in the smallest 
genome surveyed (P. dissectus). 

Conclusions: We demonstrated the utility of GR-RSC to identify homologous loci across related Penstemon taxa. 
Though PCR primer regions were conserved across a broadly sampled survey of Penstemon species (93 taxa), 
DNA sequence within these amplicons (12 SSR/INDEL markers) was highly diverse. With the continued decline in 
next-generation sequencing costs, it will soon be feasible to use genomic reduction techniques to simultaneously 
sequence thousands of homologous loci across dozens of Penstemon species. Such efforts will greatly facilitate our 
understanding of the phylogenetic structure within this important drought tolerant genus. In the interim, this study 
identified thousands of SNPs and over 50 SSRs/INDELs which should provide a foundation for future Penstemon 
phylogenetic studies and breeding efforts. 

Keywords: Breeding domesticated Penstemon, Genome reduction, Homologous sequences, LTR retroelements, 
Plantaginaceae, Pyrosequencing, Repetitive elements 
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Background 

Interest is increasing in drought tolerant landscape 
plants due to water shortages experienced by many 
municipalities, especially in the Southwestern US [1,2]. 
However, the increased use of drought tolerant species also 
carries concerns regarding the introduction of non-native 
and potentially invasive species [3,4]. One way to address 
both issues is to landscape with native xeric flora [3]. 
Penstemon Mitchell (Plantaginaceae) has excellent potential 
for xeric landscapes and some Penstemon cultivars, adapted 
to mild climates, are already used throughout Europe as 
landscape plants [5-10]. Despite its potential, few 
Penstemon cultivars are used in xeric landscapes and 
there has been little to no drought or cold tolerant 
cultivar development for such landscapes [6-8,10-12]. 
Penstemon, with over 270 species, is one of the largest 
and most diverse plant genera of those that are strictly 
indigenous to North and Central America. This genus 
features a deep diversity in morphology, including a 
broad assortment of colors, flowers, and leaf struc- 
tures. Penstemon's putative center of origin is the arid 
Intermountain West of the United States [13,14] and 
has frequently been discussed as an untapped resource for 
xeric landscape cultivar development [5-7,9-11,15-17]. 
Because domestication and cultivar development, of any 
species, is slow, costly, and time consuming, few in 
the landscape industry have invested in native species 
breeding. However, given the recent and dramatic decrease 
in costs and relative ease of genotyping, we anticipate the 
wider utilization of marker assisted selection to accelerate 
breeding programs of native species, including drought 
tolerant Penstemon [18-20]. 

PCR-based markers are now essential tools to facilitate 
plant domestication, plant breeding, germplasm con- 
servation, phylogenetics, and genetic mapping studies 
[19-22]. Not surprisingly, little molecular or traditional 
genetic work has been reported for Penstemon [23]. To 
achieve broad resolution of the genome with three of the 
most efficient markers, SSRs (simple sequence repeats or 
microsatellites), INDELs (insertions/deletions), and SNPs 
(single nucleotide polymorphisms), vast amounts of 
DNA sequence are needed, particularly for SNPs where 
sufficient read depth is needed to distinguish true 
polymorphisms from sequence noise [24-26]. With 
the development of next-generation sequencing (e.g., Roche 
454-pyrosequencing) the cost of high-throughput marker 
discovery has been dramatically reduced [18]. Additionally, 
Maughan et al. [25] described a simple genome reduction 
method, known as GR-RSC (genome reduction using 
restriction site conservation), which reduces the gen- 
ome by > 90% thereby, making it feasible to redundantly 
sequence the remaining genome with next-generation 
sequencing technologies. This process is repeated across 
multiple cultivars or species, with comparisons identifying 



many inter- and intraspecific homologous loci. Genomic 
reduction techniques consistently identify homologous 
loci between related species [20,27], and GR-RSC has 
enabled the identification and development of interspecific 
homologous SNPs [20]. 

We utilized GR-RSC to identify homologous sequences 
in four diploid (2n = 2x = 16) Penstemon species chosen to 
represent a range of taxonomic and genome size diversity 
[5,14]. Included in our analysis are two closely related 
species from the subgenus Dasanthera (P. davidsonii 
Greene and P. fruticosus (Pursh) Greene var. fruticosus), 
one from the subgenus Habroanthus (P. cyananthus 
Hook. var. cyananthus), and one (P. dissectus Elliot) from 
the monophyletic subgenus Dissecti, which is phenotypically 
divergent from all other Penstemon species. This experi- 
mental design allowed us to make broad inter- and intra- 
subgenera comparisons in Penstemon. The objectives of our 
study were three-fold: First, identify homologous SSR and 
INDEL markers from the four diverse species and test their 
conservation across 93, mostly xerophilic, Penstemon taxa. 
Second, identify conserved homologous sequences for SNPs 
for use in future interspecific studies. Third, assess observed 
variation in the GR-RSC sequences to gain insights into the 
Penstemon genome and possible reasons for the large size 
variation previously identified among the diploid taxa [5]. 

Methods 

Plant material and DNA extraction 

DNA from P. cyananthus, P. davidsonii, P. dissectus, and P. 
fruticosus leaf tissue was extracted using the CTAB purifica- 
tion method [28] with modifications [29] for the GR-RSC 
technique. The source localities and identification of these 
plants have been reported previously [5]. A single sample 
from each species with the highest quality and DNA 
concentration, as determined using a ND-1000 spectropho- 
tometer (NanoDrop Technologies Inc., Montchanin, DE), 
was selected to provide the 500 ng of DNA necessary for 
the genome reduction protocol. 

For the molecular marker experiments, we used 93 
Penstemon taxa. Leaf tissue was collected mostly from 
wild populations in the United States Intermountain 
West (Table 1). Each field-collected sample was identi- 
fied to species and (or) variety using taxonomic keys 
specific to the area [30,31]. We extracted DNA using 
Qiagen DNeasy Plant Mini Kit (Qiagen Inc., Valencia, 
CA), and concentrations were diluted to 25-35 ng/|iL. 

Genome reduction, barcode addition and 454 
pyrosequencing 

Genome reduction followed Maughan et al. [25]. Briefly, 
for each sample, EcoRl and Bfal were used for the initial 
restriction digest, after which a biotin-labeled adapter was 
ligated to the EcoRl restriction site and a non-labeled 
adapter was ligated independently to the Bfal restriction 



Table 1 Penstemon taxa (with collection counties) utilized in the 12 marker analysis with respective marker sizes 



Species 


County 1 












Marker sizes in bp 
















PS004 


PS011 


PS012 


PS014 


PS017 


PS032 


PS034 


PS035 


PS048 


PS052 


PS053 


PS075 


Subgenus Dasanthera 


P. davidsonii 


Purchased 2 


460 


500 


360 


370 


700 


370 


320, 950 


520 


440 


220 


320 


140 


P. fruticosus v. fruticosus 


Purchased 


460 


500 


360 


410 


700 


340 


360 


520 


420 


220 


320 


140 


P. m on ton us v. m on ton us 


Utah 


480 


430 


390 


370 


450 


310 


340, 310 


470 


430 


200 


390 


115 


Subgenus Dissecti 


P. dissectus 


Purchased 


440 


860 


370 


380 


750 


370 


320 


920 


380 


220 


320, 450 


140 


Subgenus Habroanthus 


P. ommophilus 


Kane 


480 


800 


400 


430 


470 


300 


1 250, 340 


470 


420 


230 


360 


125 


P. borbotus v. torreyi 


Garfield 


420 


800 


400 


490 


500 


320 


340 


500 


410 


200 


360 


110 


P. borbotus v. trichonder 


San Juan 


650, 480 


850 


420 


500, 490 


520 


310 


310 


500 


450, 410 


200 


370 


130 


P. comorrhenus 


Garfield 


650, 480 


850 


420 


490 


470 


330, 310 


310 


500 


430 


200 


360 


125 


P. compoctus 


Cache 


440 


850 


400 


500 


490 


300 


300 


480 


410 


210 


390, 360 


125 


P. cyononthus v. cyononthus 


Wasatch 


420 


860 


400 


410 


750 


370 


310, 340 


630 


420 


220 


160, 320 


160 


P. cyononthus v. subglober 


Box Elder 


440, 420 


850 


400 


490, 470 


500, 450 


340, 310 


310, 280 


520 


410 


210 


360 


120 


P. cyonocoulis 


Emery 


440 


310 


420 


490, 470 


520 


330, 320 


320 


480 


420 


210 


350 


120 


P. eotonii v. eotonii 


Utah 


420 


800 


420 


490 


450 


320 


300 


500 


NM 3 


210 


350 


135, 125 


P. eotonii v. undosus 


Washington 


420 


850 


420 


470 


420 


320 


290 


650, 500 


410 


210 


340 


125 


P. fremontii 


Uintah 


480 


850 


400 


430 


490 


320, 310 


340 


500 


420 


220 


370 


130 


P. gibbensii 


Daggett 


480, 440 


850 


420 


490 


420 


320 


300 


480 


430 


220 


360 


130 


P. idohoensis 


Box Elder 


440 


800 


400 


410 


470 


310 


340 


500 


430 


250 


340 


130 


P. loevis 


Kane 


440 


850 


400 


470 


470 


310 


350, 320 


500 


420 


220 


360 


125 


P. leiophyllus v. leiophyllus 


Iron 


480 


850, 490 


420 


430 


450 


310 


340 


480 


430 


220 


350 


120 


P. longiflorus 


Beaver 


440 


800 


420, 400 


470 


470, 450 


330, 310 


310 


500 


450 


230 


350, 220 


125 


P. novojoo 


San Juan 


480 


800 


400 


490 


550 


330, 300 


360, 340 


500 


450 


230 


410 


135, 130 


P. parvus 


Garfield 


480 


850 


450 


500, 490 


490 


320 


300 


500 


430 


210 


380, 360 


130 


P. pseudoputus 


Garfield 


480 


800 


420, 400 


430 


490, 420 


320 


340 


480 


450 


230, 220 


350 


130 


P. scoriosus v. olbifluvis 


Uintah 


440 


850 


400 


490 


490 


310 


320 


480 


410 


210 


370 


115 


P. scoriosus v. cyonomontonus 


Uintah 


440 


850 


400 


490 


490 


330 


310 


500 


420 


210 


360 


115 


P. scoriosus v. gorrettii 


Duchesne 


490, 480 


850 


420 


430 


490 


320 


420, 340 


520 


430 


230 


360 


125 


P. scoriosus v. scoriosus 


Sevier 


480 


1500, 1300 


400 


470 


520 


340,310 


310 


500 


430 


210 


360 


130 


P. speciosus 


Box Elder 


440 


800 


420 


500, 490 


490 


320 


340, 310 


520 


310 


210 


360 


120 



Table 1 Penstemon taxa (with collection counties) utilized in the 12 marker analysis with respective marker sizes (Continued) 



P. strictiformis 


San Juan 


480 


850 


400 


470 


500, 470 


370, 310 


350 


500 


410 


220 


370 


125 


P. st rictus 


Wasatch 


480 


850 


400 


410 


450 


310 


350 


520, 500 


430 


230 


340 


110 


P. subglober 


Sevier 


480 


850 


420, 400 


470 


490 


310 


350 


500 


430 


220 


350 


115 


P. tidestromii 


Juab 


390 


850 


420 


470 


490 


310 


300 


480 


400 


190 


360 


140, 120 


P. uintahensis 


Duchesne 


480 


850 


420 


490 


450 


340 


300 


520 


410 


220 


380 


120 


P. wardii 


Sevier 


480 


800 


420 


430 


490, 450 


310 


310 


520 


420 


220 


340 


135, 120 


Subgenus Penstemon 


P. abietinus 


Sevier 


440 


AD 4 


390 


400 


520 


320 


340 


500 


430 


230 


350 


125 


P. acaulis 


Daggett 


570 


490 


420 


430 


470 


320 


350 


480 


420 


220 


340 


120 


P. ambiguus v. laevissimus 


Washington 


520 


850 


390 


490 


470 


320 


1 250, 340 


500 


400 


220 


AD 


120 


P. angustifolius v. dulcis 


Millard 


440 


850, 600 


400 


490 


520 


370, 150 


310 


520 


420 


220 


360 


125 


P. angustifolius v. venosus 


San Juan 


480 


310 


390 


470 


470 


320, 150 


340 


550 


450 


220 


360 


135 


P. angustifolius v. vernalensis 


Daggett 


480 


800 


390 


430 


470 


370, 150 


350 


500 


420 


220 


380 


125 


P. atwoodii 


Kane 


440 


800 


420 


490, 390 


470 


300 


310 


480 


400 


180 


360, 280 


115 


P. bracteatus 


Garfield 


440 


850 


400 


500 


AD 


330, 310 


320 


520 


420 


230 


380 


125 


P. breviculus 


San Juan 


650, 48C 


) 190 


400 


AD 


470 


500, 220 


320 


480 


430 


210 


350 


125 


P. caespitosus v. caespitosus 


Uintah 


440 


850 


390 


390 


490 


320 


NM 


190 


NM 


210 


390, 370 


115 


P. caespitosus v. desertipicti 


Washington 


440 


230 


390 


470, 370 


470, 360 


330 


350 


1 000, 300 


430 


210 


400, 380 


130 


P. caespitosus v. perbrevis 


Wasatch 


420 


490 


390 


430, 400 


470 


320 


350 


520 


380 


220 


340 


120 


P. carnosus 


Emery 


440 


850 


420 


490 


490 


330, 300 


310 


500 


430 


220 


350 


130, 120 


P. concinnus 


Beaver 


440 


800 


420 


430, 400 


500 


480 


350 


480 


420 


190 


700, 360 


120 


P. confusus 


Washington 


480 


850 


420 


490 


520 


300 


320 


480 


450 


220 


350 


125 


P. crandallii v. atratus 


San Juan 


420 


490 


390 


500 


450 


370 


320 


280 


400 


190 


350 


120 


P. crandallii v. crandallii 


San Juan 


420 


340, 190 


390 


500 


450 


370, 340 


310 


280 


380 


190 


350 


115 


P. deustus v. pedicel latus 


Teton 


420 


850 


420 


430 


550 


340 


320 


550 


340 


230 


370 


130 


P. dolius v. dolius 


Millard 


NM 


710 


400 


400 


490, 320 


530, 300 


320 


480 


450, 420 


180 


340 


105 


P. dolius v. duchesnensis 


Duchesne 


420 


AD 


420 


400 


500 


340 


320 


480 


410 


180 


360, 340 


140, 120 


P. eriantherus v. cleburnei 


Daggett 


420 


850 


420 


410 


450 


480 


300 


500 


490, 420 


190 


390, 360 


140, 130 


P. flowersii 


Uintah 


480 


AD 


420, 400 


490, 430 


470 


300 


350 


520 


420 


220 


360 


125 


P. franklinii 


Iron 


480 


800 


400 


430 


470 


320, 300 


350 


520 


420 


240 


380 


125 


P. goodrichii 


Uintah 


420 


650 


390 


400 


490 


480 


310 


480 


400 


200 


370, 350 


135 


P. grahamii 


Uintah 


420 


850 


400 


390 


470 


530, 320 


350 


500 


420 


230 


500, 370 


120 


P. humilis v. brevifolius 


Cache 


390 


850 


370 


500 


450 


340, 320 


320 


480 


500 


220 


280 


115 



Table 1 Penstemon taxa (with collection counties) utilized in the 12 marker analysis with respective marker sizes (Continued) 



P. humilis v. humilis 


Box Elder 


420 


850 


390 


410 


520 


330, 310 


360 


500 


500, 470 


220 


350 


120 


P. humilis v. obtusifolius 


Washington 


420 


800 


390 


520, 490 


AD 


330 


340 


480 


470 


200 


350 


120 


P. immonifestus 


Millard 


480 


710 


420 


490 


380 


300 


320 


480 


410, 380 


220 


400, 360 


120 


P. lentus v. albiflorus 


San Juan 


NM 


430 


390 


430 


450 


320 


300 


500 


400 


210 


470, 370 


140 


P. lentus v. lentus 


San Juan 


480 


850 


400 


430 


470 


300 


310 


500 


410 


210 


400, 370 


145 


P. linarioides v. sileri 


Washington 


420 


850 


370 


490, 390 


470 


330, 310 


350 


470 


400 


210 


370 


125 


P. morcusii 


Emery 


390 


800 


450 


370 


490 


310 


340, 320 


500 


NM 


200 


390, 360 


120 


P. moffotii 


Grand 


390 


800 


420 


390 


490 


330 


340 


480 


430 


290, 200 


380, 350 


140 


P. nanus 


Millard 


480 


800 


420 


390 


470 


280 


320 


470 


NM 


180 


360 


120 


P. ophionthus 


Sevier 


520 


850 


420 


370 


900, 750 


330, 310 


310 


480 


420 


190 


AD 


115 


P. pochyphyllus v. congestus 


Kane 


480 


850 


400 


430 


470, 380 


320 


310 


520 


410 


250 


370 


170 


P. pochyphyllus v. mucronotus 


Daggett 


440 


800 


390, 370 


430 


500 


320 


300, 280 


520, 500 


430 


220 


350 


120 


P. pochyphyllus v. pochyphyllus 


Duchesne 


480 


850 


390 


410 


490 


370, 330 


340, 240 


400, 190 


500, 430 


290, 230 


380, 220 


125 


P. polmeri v. polmeri 


Washington 


440 


850 


400 


500, 490 


520, 490 


330 


310 


500 


430 


210 


380 


125 


P. pet blot us 


Washington 


420 


1000 


400 


500, 490 


500 


330 


300 


480 


420 


210 


380 


145 


P. pinorum 


Washington 


480 


800 


420 


610 


500 


480 


310 


480 


470 


200 


390 


125 


P. procerus v. oberrons 


Garfield 


440 


1000, 850 


450, 370 


520 


520 


330 


360 


480 


410 


220 


370 


115 


P. procerus v. procerus 


Iron 


420 


850, 550 


370 


490 


470 


340,310 


360 


470 


470 


220 


340 


120 


P. rodicosus 


Daggett 


420 


AD 


420 


490 


470 


330,310 


310 


500 


450 


200 


360 


125 


P. rydbergii v. oggregotus 


Box Elder 


420 


850 


400 


520 


500 


340 


360 


520 


470 


210 


380 


115 


P. rydbergii v. rydbergii 


Rich 


420 


710 


400 


520 


500 


370 


320 


500 


470, 430 


AD 


390 


115 


P. thompsonioe 


Kane 


420 


AD 


370 


500 


450 


340, 320 


340 


500 


410 


220 


390, 370 


130 


P. tushorensis 


Beaver 


420 


1300, 230 


370 


430 


450 


320 


320 


500, 300 


410 


230 


340 


120 


P. utohensis 


San Juan 


480 


410 


420 


430 


490 


300 


310 


500 


410 


220 


370 


125 


P. wotsonii 


Sevier 


420 


AD 


370 


490 


470 


320 


350 


480 


490 


220 


350 


120 


P. whippleonus 


Iron 


420 


800 


400 


370 


450 


310 


350 


500 


430 


210 


370 


105 


P. yompoensis 


Daggett 


570 


710 


400 


430, 390 


490 


500, 320 


310 


480 


410 


260, 230 


340 


120 


Subgenus Saccanthera 


P. leonordii v. higginsii 


Washington 


390 


1300 


420, 400 


490, 430 


550, 520 


320 


310 


480 


470 


250 


AD 


125 


P. leonordii v. leonordii 


Utah 


440 


800 


420 


430 


490 


320 


340, 320 


480 


500 


240 


370 


120 


P. leonordii v. potricus 


Tooele 


440 


850 


370 


470 


AD 


370 


310 


550, 520 


470 


230 


380 


115 


P. plotyphyllus 


Salt Lake 


420 


800 


400 


430 


470 


330 


310 


520 


430 


240 


AD 


135 


P. rostriflorus 


Washington 


420 


1100, 430 


400 


410 


420 


320, 300 


290 


500 


490, 430 


470 


370 


120 


P. sepolulus 


Utah 


420 


800 


400 


470 


450 


330 


310 


500 


430 


230 


390 


130 
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Table 1 Penstemon taxa (with collection counties) utilized in the 12 marker analysis with respective marker sizes (Continued) 



Total unique molecular weight bands 


9 


18 


6 


10 


14 


12 


12 


13 


12 


11 


17 


11 


Total pairs of dual molecular weight bands 


6 


7 


7 


16 


11 


28 


14 


7 


8 


4 


20 


7 


Total monomorphic markers 


85 


80 


86 


76 


80 


65 


78 


86 


81 


88 


69 


86 


Total NM 


2 


0 


0 


0 


0 


0 


1 


0 


4 


0 


0 


0 


Total AD 


0 


6 


0 


1 


2 


0 


0 


0 


0 


1 


4 


0 



1 All counties are in Utah except Teton, Co. which is in Wyoming. 

2 Purchased = P. davidsonii and P. fruticosus were purchased from nurseries in Utah Co., Utah while P. dissectus was purchased from a nursery in Aiken Co., South Carolina. 

3 NM - no marker. 

4 AD = ambiguous data (usually multiple bands and or smearing). 
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site. Next, a non-labeled size exclusion step using Chroma 
Spin + TE-400 columns (Clontech Laboratories, Inc., 
Mountain View, CA) and magnetic biotin-streptavidin 
separation (Dynabeads M-280 Streptavidin, Invitrogen Life 
Science Corporation, Carlsbad, CA) was performed. Unique 
multiplex identifiers (MID) barcodes were added independ- 
ently to each species using primers complementary to the 
adapter and cut sites (Table 2). Preliminary amplification 
was performed using 95°C for 1 min., 22 cycles of 95°C for 
15 s, 65°C for 30 s, and 68°C for 2 min. PCR prod- 
ucts were loaded into a 1.2% agarose Flashgel DNA 
Cassette (Lonza Corporation; Rockland, ME) to verify 
smearing and adequate amplification in preparation 
for pyrosequencing. 

After the initial PCR, concentrations of each of the 
four species samples were determined fluorometrically 
using PicoGreen® dye (Invitrogen, Carlsbad, CA). 
Samples were then pooled using approximately equal 
molar concentrations of each species except for P. 
cyananthus (genome size = 1C = 893 Mbp), where the 
molar concentration was doubled to maintain a similar 
genomic representation compared to the other three 
species with smaller genome sizes (P. dissectus, 1C = 462 
Mbp; P. davidsonii, 1C = 483 Mbp; and P. fruticosus, 
1C = 476 Mbp; [5]). DNA fragments between 500-600 bp 
were selected following Maughan et al. [25]. Sequencing 
was performed by the Brigham Young University 
DNA Sequencing Center (Provo, UT) using a half 
454-pyrosequencing plate, Roche-454 GS GLX instrument, 
and Titanium reagents (Brandord, CT). 

Sequence assembly 

Sequence data were sorted by species using their unique 
MID species barcode (Table 2) by means of the software 
package CLC Bio Workbench (v. 2.6.1; Katrinebjerg, 
Aarhus N, Denmark). Following sorting (Table 2), assem- 
blies were performed using Roches de novo assembler, 
Newbler (v. 2.6), which yields consensus sequences (contigs) 
of all individual reads, from each independent species, 
for use in subsequent analyses. 



A full assembly (all individual reads of all four species 
pooled together) was performed by Newbler with 
"complex genome" parameter set and a trim file with 
MID barcodes specified; all other parameters were left 
to their defaults. For all subsequent species assemblies 
(all individual reads of one species), these same parameters 
were used with a few added conservative options selected: 
an expected depth of 10' (20 default), a minimum overlap 
length of '50' (40 default), and a minimum overlap identity 
of 95% (90% default). 

Repeat element identification 

Assembled sequences from all four species were masked 
for possible genome wide repetitive elements using a 
combination of RepeatModeler and RepeatMasker [32]. 
RepeatModeler is a de novo repeat element family identifi- 
cation and modeling algorithm that implements RECON 
[33] and RepeatScout [34]. RepeatModeler scanned all 
contigs from the four Penstemon species assemblies and 
produced a predicted repeat element library of predictive 
models to find repeat elements. Using this reference 
library, RepeatMasker then scanned the four species to 
filter out repetitive elements. Singletons were omitted 
from the analysis. To assess possible repetitive element 
biases with RepeatMasker when implementing a denovo 
library from RepeatModeler, we analyzed the GR-RSC data 
from Arabidopsis RILs (recombinant inbred lines) Ler-O 
and Col-4 from Maughan et als. [35] study, compared to 
the Arabidopsis non-reduced genome downloaded from 
TAIR (The Arabidopsis Information Resource) [36]. 

Marker development, verification, and use 

To identify SSRs, INDELs, and SNPs, we used soft- 
ware MISA and SNP_Finder_Plus (custom Perl-script), 
respectively [25,37,38]. RepeatMasker was used to identify 
and mask transposable elements. MISA parameters were 
set as follows: di-nucleotide motifs had a minimum of 
eight repeats, tri-nucleotide motifs had a minimum of six 
repeats, tetra-nucleotide motifs had a minimum of 
five repeats, and 100 bp was set as the interruption 



Table 2 The four multiplex identifiers (MID) barcodes (adapter) primers used for the genomes of Penstemon 
cyananthus, P. dissectus, P. davidsonii, and P. fruticosus 



Species MID ID # fcoRI MID primer 1 Bfal MID primer 2 



P. cyananthus 


MID 1 


5- 


ACGAGTGCGTGACTGCGTACCAATTC 


5- 


ACGAGTGCGTGATGAGTCCTGAGTA 


P. dissectus 


MID 2 


5- 


ACGCTCGACAGACTGCGTACCAATO 


5- 


ACGCTCGACAGATGAGTCCTGAGTA 


P. davidsonii 


MID 3 


5- 


AGACGCACTCGACTGCGTACCAATO 


5- 


AGACGCACTCGATGAGTCCTGAGTA 


P. fruticosus 


MID 4 


5- 


AGCACTGTAGGACTGCGTACCAATO 


5- 


AGCACTGTAGGATGAGTCCTGAGTA 



1 The "AATTC" at the 3' end the primer was where adapters complement the enzyme EcoR1 cut site and the preceding "C" is where the base was changed to 
avoid further enzymatic cleavage of the fragment. 

2 The "TA" at the 3' end of the primer was where adapters complement the enzyme cut site and the preceding "G" is where the base was changed to avoid 
further enzymatic cleavage of the fragment. 
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(max difference between two purported SSR alleles). For 
the comparison of SSR frequency and repeat motifs 
across species, "unmasked" assembly files were used to 
remove bias caused by masking low complexity reads. 
The following parameters were used to define the heur- 
istic thresholds for SNP_Finder_Plus: 8x minimum read 
depth for the SNP, 30% proportion of the reads 
representing the minor allele and 90% identity (an indi- 
cation of homozygosity within a single species used in a 
dual-species assembly) required for each SNP locus. 
These parameters also helped compensate for sequen- 
cing and assembly errors, which allow greater confi- 
dence in calling base pair discrepancies as actual SNPs 
in the dual-species assemblies and the confident identi- 
fication of heterozygosity in the individual assemblies. 
For both individual assemblies and dual species assem- 
blies SNPs reported are those conforming to the afore- 
mentioned parameters. 

All genomic sequences matching the above criteria 
were used for marker development. Primer3 v2.0 [39] 
was used to identify primers for amplifying these 
markers, with the following parameters: optimal primer 
size = 20 (range = 18-27); product size range = 100-500 
base bp; Tm range = 50-60°C with 55°C optimum; and 
maximum polynucleotide = 3. Allowing PCR products 
greater than 200 bp greatly increased the possibility of 
INDELs in the PCR products. 

The PCR (SSR/INDEL) markers were validated using 
the original four species as template DNA. Each 10 \A 
PCR reaction had ~ 30 ng genomic DNA, 0.05 mM 
dNTPs, 0.1 mM cresol red, 1.0 ul of 10X PCR buffer 
(Sigma- Aldrich, St. Louis, MO), 0.5 units of JumpStart™ 
Taq DNA Polymerase (Sigma- Aldrich, St. Louis, MO) and 
0.5 \iM (each) of the forward and reverse primers. The 
thermal cycler (Mastercycler® Pro; Eppendorf International; 
Hamburg, Germany) was set as follows: 94°C for 30 s, 
45 cycles of 92°C for 20 s, (primer specific annealing 
temperature)°C for 1 min. 30 s, 72°C for 2 min., and 72°C 
for 7 min. (final extension). Following PCR reactions, 
DNA was loaded into 3% Metaphor® agarose (Lonza 
Corporation; Rockland, ME) gels and run using a gel 
electrophoresis box at 100 V for 2 h. Optimal 
annealing temperatures for each SSR/INDEL marker 
were selected based on clarity of bands produced over 
varying annealing temperatures. Only SSR/INDEL 
markers with one or two reproducible bands are 
reported in the marker studies (Tables 1 and 3). The same 
conditions used for marker validation were used in the 
SSR/INDEL marker studies, except gel electrophoresis 
times were increased to 4 h at 100 V. 

The gels were evaluated and scored as: 1 = marker 
present; 0 = marker absent based upon molecular weight. 
The results were then analyzed to assess the strength 
of hierarchical signal in these data using 10,000 



replications of fast bootstrapping as implemented in 
PAUP* v. 4.0bl0 [40]. 

Our interspecific SNP genotyping was accomplished 
using Fluidigm (Fluidigm Corp., South San Francisco, CA) 
nanofluidic Dynamic Array Integrated Fluidic Circuit 
(IFC) Chips [40] on the EP-1TM System (Fluidigm Corp., 
South San Francisco, CA) and competitive allele-specific 
PCR KASPar chemistry (KBioscience Ltd., Hoddesdon, 
UK). A 5 \iL sample mix, consisting of 2.25 \iL genomic 
DNA (20 ng uL" 1 ), 2.5 uL of 2x KBiosciences Allele Spe- 
cific PCR (KASP) reagent Mix (KBioscience Ltd.), and 
0.25 [iL of 20x GT sample loading reagent (Fluidigm 
Corp., South San Francisco, CA) was prepared for each 
DNA sample. Similarly, a 4 (iL lOx KASP Assay, 
containing 0.56 \iL of the KASP assay primer mix (allele 
specific primers at 12 \iM and the common reverse primer 
at 30 |iM), 2 \iL of 2x Assay Loading Reagent (Fluidigm 
Corp., South San Francisco, CA), and 1.44 \iL DNase-free 
water was prepared for each SNP assay. 

The two assay mixes were added to the dynamic array 
chip, mixed, and then thermal cycled using an integrated 
fluidic circuit Controller HX and FC1 thermal cycler 
(Fluidigm Corp., South San Francisco, CA). The thermo 
cycler was set as follows: 70°C for 30 min; 25°C for 10 min 
for thermo mixing of components followed by hot-start 
Taq polymerase activation at 94°C 15 min then a touch- 
down amplification protocol consisting of 10 cycles for 
94°C for 20 sec, 65°C for 1 min (decreasing 0.8°C per 
cycle), 26 cycles of 94°C for 20 sec, 57°C for 1 min, and 
then hold at 20°C for 30 sec. Five end-point fluorescent 
images of the chip were acquired using the EP-1TM 
imager (Fluidigm Corp., South San Francisco, CA), once 
after the initial touchdown cycles were complete and then 
after each additional run on "additional touchdown 
cycles." The extra cycles were run four times, with an 
analysis of the chip after each run. 

The determination of each SNP allele was based on a 
minimum of at least two of three SNP genotyping experi- 
ments. The primers were then analyzed for functionality 
using the results from each of the five stops for each chip, 
which were compared to determine the most accurate call. 
Functionality was determined by number of calls verses 
no calls, and consistency. 

Cross species sequencing verification 

To evaluate the DNA sequence homology and polymorph- 
ism type (SSR or INDEL) at specific marker amplicons 
(Table 1) across the Penstemon genus, DNA samples from 
each of five species (P. cyananthus, P. davidsonii, P. 
dissectus, P. fruticosus, and P. pachyphyllus) were amplified 
and Sanger sequenced. We accomplished the PCR 
amplification using Qiagen HotStarTaq Plus Master 
Mix (Valencia, CA, USA) according to the manufacturer s 
recommendations. The amplification protocol consisted 



Table 3 Summary of marker characteristics including the primary SSR motif identified in the original GR-RSC 

(genome reduction using restriction site conservation) sequence, primer sequences, EFL (expected fragment length), total bands, and fragment sizes 



Marker 


Primary 


Forward primer (5-3') 


GenBank 


EFL 


Total unique 






Fragment size 




name 1 


motif 


Reverse primer (5-3') 


accession ID 




bands 


P. cyananthus 


P. davidsonii 


P. dissectus 


P. fruticosus 


PS003 (dilf) 


(ADs 


TGCCTCTGTCmACATOCAA 
CATGAAGCACTGCAAATCCA 


JQ966997 


217 


3 


360 


260 




250 


260 


PS004 (da,f) 


(ATQ 6 


TGmCAATOCTGTCCACAT 
TOTCTGTCCAAACGGTAGGT 


JQ951613 


476 


3 


420 


460 




440 


460 


PS005 (c ' di ' 0 


(GAA) 6 


GCCCAACTCCGTAATOAA 
AACTGCTOCCACTCGACTC 


JQ966998 


303 


3 


260, 300 


260 




280 


280 


PS009 (c ' da '° 


(TGA) 6 


ACCTCGAACTTGACGGTCC 
TOTGAGGAGAAACCAAGGG 


JQ966999 


466 


4 


370, 650 


540 




650 


600 


PS01 1 (da '° 


(GA) 8 


AAGTGCGACACTGGATGTC^ 
GCAGOTCAGCTCCAGAAAT 


JQ951614 


435 


2 


860 


500 




860 


500 


PS012 (c) 


aA) 8 


TCCATATTGTAACCAACAATGACTG 
TGAATGGCAAAGCGTAATCA 


JQ951615 


402 


3 


400 


360 




370 


360 


PS013 (f) 


aA) 8 


GAAGAATOAmAAACAAGATGCAA 
TCAGTACGTGAGAAACTOATCAATAA 


JQ967000 


399 


2 


400 


650 




650 


400 


PS014 (c) 


acA) 5 


CGAmGGTATAGTOGATOCGA 
CCITCATCACCCGGTAC^G 


JQ951616 


409 


3 


410 


370 




380 


410 


PS015 (di) 


aco 6 


GCCGAGmCAAGAAAGCAA 
AATOCGACCTGCCACGC 


JQ967001 


409 


2 


490 


500 




490 


490 


PS016 (c ' di) 


(CDs 


CATGGCCCmCTCACACT 
GACGCGGTOGCTATACAGT 


JQ967002 


447 


3 


NM 2 


1,100 




1,060 


1,030 


p S017 (da,di) 


(AG) 9 


GAAGGCTOGCATAAATCCTCAAA 
ATOGGCTCCCACGAACAAA 


JQ951617 


455 


2 


750 


700 




750 


700 


p S019 (c,di) 


(AG) 8 


AATCCCACAGCCCATACAAA 
TGAATOAGTCCTATACCCTAmCAA 


JQ967003 


473 


1 


380 


380 




380 


380 


PS021 (f) 


(CDs 


cmagotagctggaatacacg^ 
aga^otgcatcacagtcaa™ 


JQ967004 


386 


3 


350 


450 




450 


420 


PS023 (da) 


(AG) 8 


GCTGGAGAATAACATGGCG 
CCATCTOCAAGTCCATACG 


JQ967005 


469 


4 


310 


480 




120, 740 


480 


PS024 (da '° 


(CTG) 6 


CTOTOCCCTGTGCCTCT 
CCACCACCAACAACAACAAC 


JQ967006 


403 


2 


430 


430 




400 


430 


PS025 (c ' di) 


ac) 9 


GCACATGAATGAAGGAATGC 
ACGATCTGTGAAGGAACCCA 


JQ967007 


440 


3 


440 


410 




440 


400 



Table 3 Summary of marker characteristics including the primary SSR motif identified in the original GR-RSC 

(genome reduction using restriction site conservation) sequence, primer sequences, EFL (expected fragment length), total bands, and fragment sizes 

(Continued) 



PS026 (c ' da ' di '° 


(CT0 6 


ACTOATAATGCCTCOTGTGTCA 
TOCGCAACGTOTAmGA 


JQ967008 


465 


1 


460 


NM 


460 


460 


PS028 (di) 


(AC) 9 


GGGAGGCAGGTAACAACAAA 
TACCTCTGCCGAACTGGA^ 


JQ967009 


316 


4 


950 


400, 460 


320 


400, 460 


PS029 (di) 


aA) 8 


ACCAAGTOTOGATGmGG 
GGmGGAATGAGACTOGAAGGA 


JQ967010 


440 


3 


840 


500 


500 


420 


PS032 (c ' di) 


(GT) 9 


ACAAAGTCTCCTCAATCGCC 
GCATGTACCGTGCACACACT 


JQ951618 


328 


2 


370 


370 


370 


340 


PS034 (c) 


(AC) 9 


CCAAACAAATCAAACAGCACTC 
CATGCGAATCAGTGTOCTAA 


JQ951619 


322 


5 


310, 340 


320, 950 


320 


360 


PS035 (da '° 


ao 9 


TOCACAGCTACmGGCAT 
ATCTGTCCAAGGCATGGAAT 


JQ951620 


486 


3 


630 


520 


920 


520 


PS036 (c ' di) 


aA) 8 


TOCTAAmGGTAGCTGCAATC 
TCCGAGGAACTATOCCA^ 


JQ96701 1 


405 


3 


770 


770, 820 


590 


770 


PS038 (c ' da) 


aA) 8 


GTAATOOTCGGCAGmGTOAm 
GGTGCGACCTAATOCGmCTAT 


JQ967012 


100 


1 


NM 


100 


100 


NM 


PS040 (da) 


(CA) 9 


TAAAGAGGCTOAGCGCGG 
ACCTGAAGAGCTGCGGAGTA 


JQ967013 


399 


3 


380 


390 


410 


390 


PS041 (c ' da ' di - f) 


(A^ 8 


mCCGCAAGAGAAGAGCAT 
OTGTGCACGATOCATOT 


JQ967014 


249 


3 


270 


670 


270 


240 


PS045 (c ' da) 


(CT) 8 


GCCACATACATGAAACGTGAA 
CGAACTCTOTGTGmCTCCC 


JQ967015 


366 


4 


460 


NM 


440 


120, 400 


PS047 (c ' dlf) 


(AQ 8 


ACACGACATCGmCAGCAA 
GCGTATGGAGAGAmGGGA 


JQ967016 


428 


3 


470, 510 


440 


470 


470 


PS048 (c ' di) 


(CA) 9 


GCATOGATGCCGAAATATCTACAA 
TGCCTGTAGGTOAmCCm 


JQ951621 


436 


3 


420 


440 


380 


420 


PS049 (c ' da ' di ' f) 


(AG) 8 


CCCATCAATAAAGAAAGAAAGAAAGA 
GGTGAAACCCTGTCCTAAACC 


JQ967017 


436 


2 


460 


460 


1,000 


460 


PS050 (c ' di) 


(A^ 9 


GTGTAACCTCTGAACAAGmACTGAA 
TGCAGTGAGCCATGCTATC 


JQ967018 


434 


2 


480 


460 


480 


460 


PS051 (c ' di '° 


(tq 8 


TGTAACACGACAAmAACTCmCA 
CGAGAACTCmCCGAGAACC 


JQ967019 


352 


1 


280 


NM 


280 


280 
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(genome reduction using restriction site conservation) sequence, primer sequences, EFL (expected fragment length), total bands, and fragment sizes 
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PS052 (c ' da ' di ' f) 


(AC) 9 


CGCGGTCAATOTGAAATCT 
TGAOTCCTCTCTCTCTCTCACAC 


JQ951622 


206 


1 


220 


220 


220 


220 


PS053 (di) 


(AQ 8 


AATCATAGTCTCGAGCGCGT 
GAGATAAATTAGATCAGCGCATCA 


JQ951623 


410 


3 


160, 320 


320 


320, 450 


320 


PS054 (c ' da '° 


(GA) 8 


TCGTOAGCAATCTCGGAGC 
TCGACTGGAGAGCAAAGCA 


JQ967020 


192 


3 


200 


200 


180 


190 


PS055 (f) 


(AG) 8 


TGTGGTCCGGTOCATAAAC 
mGTCTCCCTAATATGTGTGATGAT 


JQ967021 


412 


4 


960 


500 


1,040 


470 


PS056 (da ' di) 


(tq 8 


CATGmCAGGATTGGGOT 
CGGTOCACACAGGTOTOA 


JQ967022 


319 


4 


690 


450 


230 


340 


PS057 (da ' f) 


(A^ 8 


TGCCTAATGGACCTGATCCT 
CCCAATOmGAAGAAAGAACA 


JQ967023 


402 


2 


570 


440 


570 


440 


PS058 (da) 


(A^ 9 


GTGCAACCAATGCAACTAATO 
TCTCTCAmCCAATGAmCTCA 


JQ967024 


469 


1 


NM 


720 


NM 


720 


PS059 (di) 


(CDs 


CATCAATOACACACAAGCAGA 
TCGAATCTOAAGAAACACATCCA 


JQ967025 


312 


2 


930 


340 


340 


340 


PS060 (c ' di) 


(AC) 9 


CCATGAGAAGTAGATGACTGGGA 
TOTAATOTGATOACmXCTCG^ 


JQ967026 


484 


2 


560 


560 


560 


540 


PS061 (da '° 


aA) 8 


CGACCAATCATCAACCAACA 
GACGGGCAGAATAATTGGAA 


JQ967027 


453 


3 


480 


480, 530 


450, 480 


NM 


PS062 (c ' di) 


(ta) 9 


tggagagggtacgaaagtgc 
caacgatcga™™gcacca 


JQ967028 


320 


2 


350 


290 


350 


290 


PS064 (c ' da '° 


(AG) 8 


ATGGATGCCCTATGGGTACA 
TGAAATGGAGGGAGTAATATAAACAA 


JQ967029 


437 


4 


490 


500, 680 


470 


470 


PS066 (di) 


(GA) 9 


CAAGGATGCAGGCTCTCA^ 
CTCTGCTCGTCGTAGTGCAA 


JQ967030 


434 


2 


250 


480 


250, 480 


480 


PS068 (c ' da ' di '° 


(GA) 8 


mGGGATGCAmCTCCAC 
TCAAAGTGACATOTCCAACAAA 


JQ967031 


463 


2 


500 


500 


480 


480 


PS069 (di) 


(GT) 8 


CATOGGTCAGAmGGOT 
GCmCAGmGTATAmGTGCC 


JQ967032 


309 


4 


220 


210 


390 


350 


PS071 (c ' di) 


(AT) 8 


aagatggccctgatctgto 
togtgggagtocaaa™ 


JQ967033 


446 


1 


NM 


NM 


490 


NM 
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PS074 (c ' da ' di ' f) 


(A AG) 6 


AGAAATCTCGCTCTCCACGA 
CGACAACCTOGTGATCGCm 


JQ967034 


168 


1 


170 


170 


170 


170 


PS075 (c ' da ' di ' 0 


(TA) 8 


CACCACmCGCAGCAmA 
CAAATOCATOTOTATGGAAACACG 


JQ951624 


120 


2 


160 


140 


140 


140 


PS076 (c ' di) 


(GTG) 6 


CTGACAGCAACATGAACATGAA 
CAATCmGCCAAmCCCA 


JQ967035 


161 


1 


170 


170 


170 


170 



1 Parentheses indicates the species possessing sequence from which primers were designed (c = P. cyananthus, da = P. davidsonii, di = P. dissectus, f = P. fruticosus). 

2 NM - no marker. 
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of an initial denaturation step of 5 min at 95°C, followed 
by 40 cycles of amplification consisting of 30 sec denatur- 
ation at 94°C, 30 sec for primer annealing at 55°C and 
1 min of extension at 72°C. PCR products were separated 
on 1% agarose gels run in 0.5X TBE and visualized 
by ethidium bromide staining and UV transillumination. 
PCR products were purified using a standard ExoSAP 
(Exonuclease I/Shrimp Alkaline Phosphatase) protocol 
and sequenced directly as PCR products. DNA sequencing 
was performed at the Brigham Young University DNA Se- 
quencing Center (Provo, UT, USA) using standard 
ABI Prism Taq dye-terminator cycle- sequencing 
methodology. DNA sequences were analyzed, assembled 
and aligned using Geneious software (Biomatters, 
Auckland, New Zealand). 

Gene ontology 

We used BLASTX [41] on assembled sequences of all 
four species to compare with the GenBank refseq-protein 
database [42] with a threshold of<1.0e" 15 . Blast2GO 
(v2.4.2) was used to map the blast hits and annotate them 
to putative cellular components, biological processes, and 
molecular functions found in the blast database [43]. For 
species comparisons, the GO level 3 was used for cellular 
components and level 2 was used for both biological 
processes and molecular functions. 

Assembled sequences of all four species were also 
compared to all available Antirrhinum and Mimulus 
(genera more or less related to Penstemon) genes on 
GenBank (downloaded 23 June 2011). Comparisons were 
made using BLASTN [41] with an e-value threshold 
of <1.0e" 13 . 

Results and discussion 

Genome reduction, pyrosequencing and species 
assemblies 

Given that a full 454 pyrosequencing plate using Titanium 
reagents is capable of producing 1.3 million reads 
averaging -400 bp each [25], we expected a half plate to 
produce approximately 250 Mbp from 650,000 reads. Our 
reaction produced 287 Mbp from 733,413 reads, 20% 
more than expected, with an average read length of 
392 bp. In total, 93.8, 46.4, 48.8, and 53.3 Mbp were 
sequenced from P. cyananthus, P. dissectus, P. davidsonii 
and P. fruticosus, respectively, closely resembling the 
2:1:1:1 ratio of DNA pooled from each species for sequen- 
cing (Table 4). Likewise, from our de novo assemblies, we 
identified nearly twice as many contigs, 9,714 in P. 
cyananthus than the 4,777 found in P. fruticosus, for 
example, which was expected because we sequenced 
approximately twice as much DNA from P. cyananthus 
than the other three species. There was 0.6% of P. 
cyananthus genome represented compared to 0.5% average 
coverage of the other three species (Table 4); thus, 



essentially an equal genome representation from each spe- 
cies was realized using the GR-RSC technique by pooling 
approximately equal genome molar concentrations in the 
sequencing reaction. The contigs of this study have been 
deposited at DDBJ/EMBL/GenBank as a Whole Genome 
Shotgun project under the accessions AKKG00000000 (P. 
cyananthus), AKKH00000000 (P. dissectus), AKKI00000000 
(P. davidsonii), and AKKJ00000000 (P. fruticosus). The 
version described in this paper is the first version for each 
accession, XXXX01000000. 

DNA sequences produced by the GR-RSC technique 
represent a broad sample of the genome. With this sample, 
we can begin to estimate genome-wide characteristics, such 
as GC content, frequency of repeat elements, and so forth. 
From the genome reduction, GC content was measured to 
be 36.4%, 34.5%, 35.3%, and 35.15% for P. cyananthus, 
P. dissectus, P. davidsonii and P. fruticosus, respectively 
(Table 4), matching the average 35% GC content reported 
for dicots [44]. Using the dicot average GC content a 
priori, we estimated a theoretical frequency of the Bfal 
and EcoRl recognition sites. The theoretical GC content 
in combination with estimated genome sizes of the four 
species [5] suggested the GR-RSC should have rendered a 
104 fold reduction of the genome of each species. With a 
reduced genome of these species, the 650,000 reads that 
were sequence suggest an average of llx coverage; 
however the observed read depth was 8.5 x, 22.7% less 
than expected (Table 4). This lighter coverage is partly due 
to the lower than expected specificity of reads. An average 
of 48.2% of the reads were matched to contigs with the 
other 51.8% either too short or lacking in homology to 
successfully match to a contig (Table 4). 

The full assembly of all four Penstemon, using the 
Newbler de novo assembler, produced a total of 44,966 
contigs, representing 16.4 Mbp, or 5.7% of our total 
sequence. In the individual species assemblies of P. 
cyananthus, P. dissectus, P. davidsonii, and P. fruticosus, a 
total of 9,714, 5,364, 4,882, and 4,777 contigs were created 
representing 4.6, 2.6, 2.4, and 2.3 Mbp of assembled bases 
respectively. These contigs represent, on average, 0.5% of 
the total genomes being sequenced (Table 4). 

Marker analysis 

We utilized assembly contigs from genomic sequence of 
all four species with "masked" multiple repeats, such as 
transposons, to identify SSRs. Penstemon cyananthus, P. 
dissectus, P. davidsonii, and P. fruticosus had 97, 113, 49, 
and 58 SSRs identified respectively (Table 5). There were 
more SSRs identified in P. dissectus than P. cyananthus, 
which has a 1.9 times larger genome and a higher repre- 
sentation of sequence than P. dissectus (Table 5). This 
inverse relationship between genome size and SSRs 
content agrees with observations in other plant genomes 
[45]. Some SSRs were found as putative homologs in 



Table 4 Summary data from 454-pyrosequencing and Newbler de-novo assembly (v.2.0.01) of Penstemon cyananthus, P. dissectus, P. davidsonii, and P. 
fruticosus 



Assembly 


Genome size 
(Mbp) 1 


GC content 


Reads 


Bases 2 


% Reads 
assembled 


% Bases 
assembled 


Contigs 
created 


Bases in 
assembly 


% Genome 
represented 


Average 
coverage 


Bases shared 
between assemblies 


P. cyananthus 


893 


36.4% 


199,329 


87,753,792 


53.1% 


50.0% 


9,714 


4,623,755 


0.5% 


7.7X 




P. dissectus 


462 


34.5% 


98,868 


43,304,550 


52.8% 


50.9% 


5,364 


2,629,819 


0.6% 


8.2X 




P. davidsonii 


483 


35.3% 


103,963 


45,599,742 


45.8% 


43.5% 


4,882 


2,376,141 


0.5% 


9.1X 




P. fruticosus 


476 


35.2% 


113,146 


49,786,980 


41.0% 


38.8% 


4,777 


2,322,606 


0.5% 


8.9X 




P. cyananthus x P. dissectus 






298,197 


131,058,342 


53.0% 


50.1% 


14,523 


6,915,079 






338,495 


P. cyananthus x P. davidsonii 






303,292 


133,353,534 


49.9% 


46.9% 


14,254 


6,757,023 






242,873 


P. cyananthus x P. fruticosus 






312,475 


137,540,772 


47.8% 


44.9% 


14,134 


6,705,536 






240,825 


P. dissectus x P. davidsonii 






202,831 


88,904,292 


48.3% 


46.1% 


10,053 


4,855,491 






150,469 


P. dissectus x P. fruticosus 






212,014 


93,091,530 


45.7% 


43.5% 


9,873 


4,774,539 






1 77,886 


P. davidsonii x P. fruticosus 






217,109 


95,386,722 


44.0% 


41.7% 


9,184 


4,442,194 






256,553 


Full Penstemon Assembly 






730,215 


265,987,500 


47.9% 


46.4% 


44,966 


16,363,589 









1 The diploid (2n = 2x= 16) genome size as reported by Broderick et al. [5]. 

2 Bases denotes the total number of bases used to create the assembly and not the total number of bases sequenced. 
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Table 5 Data obtained from MISA (SSR), Blast2GO (GO) and RepeatMasker (RM) 

Penstemon species 







P. cyananthus 


P. dissectus 


P. davidsonii 


P. fruticosus 


SSR 


Total SSRs 1 


97 


113 


49 


58 




SSRs/Assembly Length 


2.1E-05 (-1/48000) 


4.3E-05 (-1/23000) 


2.1E-05 (-1/48000) 


2.5E-05 (-1/40000) 




Repeat Type di- 


44.3% 


40.7% 


46.9% 


48.3% 




tri- 


45.4% 


43.4% 


44.9% 


41.4% 




tetra- 


10.3% 


15.9% 


8.2% 


10.3% 


GO 


Contigs Analyzed 


9,714 


5,364 


4,882 


4,777 




Blast Hits Found 2 


1,899 


1,125 


1,121 


1,091 




Annotated Hits 


1,430 


844 


388 


826 




% Blast Hits 


19.5% 


21.0% 


23.0% 


22.8% 




% Annotated 


14.7% 


15.7% 


7.9% 


1 7.3% 


RM 


Masked Repeat Elements 


28.5% 


16.8% 


1 7.4% 


16.1% 




Retroelements (LTR) 


7.8% 


3.0% 


4.9% 


4.6% 




DNA Transposons 


0.3% 


0.9% 


1 .0% 


1 .0% 




Other Repeats 3 


20.4% 


12.9% 


1 1 .6% 


10.5% 



1 For MISA, "unmasked" individual species assemblies were used. 

2 Sequence compared to the GenBank refseq- protein database e-value threshold of <1.0e" 15 . 

3 Other Repeats includes: lines, unclassified repeats, satellites, simple repeats, and low complexity sequence. 



multiple species; after eliminating redundancies, we tallied 
133 unique SSRs (Table 3). We generated primer pairs sur- 
rounding 77 of these SSRs large enough to potentially cap- 
ture INDELs, of these, 51 produced 1 or 2 reproducible 
bands with no or few faint superfluous bands. From those 
51, there was an overall success rate of 94% with 42 (82%) 
being polymorphic between the four species (Table 3). 

To assess the possibility of utilizing these markers in 
interspecific plant improvement studies, 12 of the 51 
SSR/INDEL markers (Table 3) were tested on 93 mostly 
xeric Penstemon taxa (72 species [Table 1]) representing 
five of six subgenera recognized in the genus [14]. The 
overall success rate of the markers was 98% with 100% 
being polymorphic across the 93 taxa. Without sequencing 
each band and/or doing inheritance studies on each marker 
it is not possible to clearly determine if a polymorphism of 
a given marker is a variant of an allele or a new locus. 
However, we did amplify and sequence the amplicon 
produced at 11 of these markers in five Penstemon species 



(P. cyananthus, P. davidsonii, P. dissectus, P. fruticosus, and 
P. pachyphyllus). P. pachyphyllus var. pachyphyllus repre- 
sents the largest subgenus (Penstemon) in the genus. These 
five species represented four of the presently classified six 
Penstemon subgenera. Of the 55 attempted sequences, 60% 
produced high quality sequences results which could be 
compared to the original 454 contigs containing the 
microsatellites. Using BLASTN (v2.2.25+) [41] we found 
that 33 sequences matched the respective microsatellite- 
containing contigs from which the SSR/INDEL markers 
were derived with an e-value of no more than l.Oe" 36 . An 
example of the types of polymorphism (SSRs and INDEL) 
found at these loci across the various species is represented 
graphically for the marker PS035 (Figure 1). For 22 
(40%) of the 55 attempted sequences, we were unable 
to obtain high quality sequence information. In the 
majority of these cases (94%) the lack of high quality 
data was clearly due to the amplification of multiple 
amplicons (seen as multiple bands in gel electrophoresis) 
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which impeded the sequencing of the PCR reaction. 
The source of the multiple amplicons may be from 
heterozygousity at the locus or from the amplification 
of paralogous loci. 

Both the sequence data (Figure 1) as well as the 
marker size data (Tables 1 and 3) are clear evidence of 
sequence conservation, and probable homologous loci, 
in many of the SSR/INDEL markers. Marker PS012, 
the apparent most conserved marker, had six unique 
molecular weight bands and was present in all 93 taxa. 
The marker with the most diversity in its molecular 
weights was PS011 which had 18 variants and was not 
readable in seven of the 93 taxa. Of the 1,116 possible 
marker x taxa interactions, 22 (2.0%) did not produce 
reliable data. Seven of those 22 (0.5%) were absent of 
any product with the remaining 15 producing multiple 
bands (reported as ambiguous data). Clearly readable 
double bands were found in 135 of the 1,116 (12.1%) 
marker x taxa interactions (Table 1). 

Our data suggest a high degree of sequence conservation 
across the genus, favoring the present hypothesis of a 
recent and rather rapid evolutionary radiation of the genus 
[13,14]. Furthermore, our data agree with Morgante et al. 
[45] who suggest that SSR presence in non-coding 
sequence are highly conserved and predate recent genome 
expansions of many plants. Some of our markers differed in 
length by as much as 570 bp (Tables 1 and 3) suggesting 
the presence of INDELs and possibly additional SSRs 
(Table 3). We confirmed the presence of INDELs in the 
sample of 11 markers which we sequenced (Figure 1). In 
some instances, these large fragment length variances 
may be amplifying a different locus, which is a recognized 
concern when using SSR based markers above the species 
level [46,47]. INDELs are useful as PCR based markers 
since they, like SSRs, are codominant and abundant in the 
genome and are commonly used in genetic mapping [26]. 
By combining the SSRs we identified in the source 
sequence for each of these markers with potential INDELs, 
alleles will be easily and inexpensively identified by gel 
electrophoresis. 

To assess the possibility of phylogenetic (i.e., hierarchical) 
structure of the variation within these SSR/INDEL data at 
the broad taxonomic scale of our survey, we analyzed the 
12 marker data set (Table 1) with PAUP*. Fast bootstrap- 
ping recovered a largely unresolved topology suggesting 
rampant homoplasy. Or one or more of these markers 
represent more than one locus. These results are similar to 
what others have reported about SSR type markers. SSRs 
have demonstrated utility for population and intraspecific 
relationships, such as cultivar differentiation; however, they 
can be problematic when used to reconstruct relationships 
above the species level where length differences are 
expected to poorly reflect homology [47,48]. Nonetheless, 
with over 96% of these SSR/INDEL regions being 



conserved across Pensternon, these markers have potential 
for studies of interspecific hybridization and cultivar 
development. 

Interspecific Pensternon breeding is complex [7,11,15,49]; 
thus, having a set of inexpensive and easily used SSR/ 
INDEL markers, which amplify across the genus, will 
have utility in understanding the results of some wide 
crosses. Empirical studies of various Pensternon interspe- 
cific crosses have ranged from a clearly recognizable 
intermediate phenotype of the two parents, to the F 1 es- 
sentially mimicking one of the two parents, usually 
mirroring the female parent. Furthermore, in some in- 
stances the F 2, s and additional generations continue to 
mimic the female parent to the point that Viehmeyer 
[49] began to question if apomixis was involved. An ex- 
ample of this phenomenon was a 'Flathead Lake' x P. 
cobaea interspecific cross. It was not until the hybrid 
progeny of this cross was crossed with other interspecific 
hybrids when the progeny gave a much wider range of 
phenotypes [49] . A probable reason for this phenomenon 
is "unequal segregation" which has been described in 
other wide crosses [50,51]. Thus through the use of 
these SSR/INDEL markers, regions of the genome can 
be identified which are unusual genotypic combina- 
tions, for that specific cross, and selections made ac- 
cordingly [51-54]. Thus increasing the number of 
unique genotype/phenotype plants to be grown out to 
maturity from thousands of seedlings. Since many 
Pensternon require two years before their first anthe- 
sis, using markers to identify the greatest number of 
genotypic diverse plants is potentially very useful in 
the breeding of this crop. 

Beyond amplification ability, we also assessed the com- 
position and trends of all SSRs identified. On average, 
adenine and thymine rich repeat motifs were the most 
common repeat type in the di-, tri-, and tetra-nucleotide 
repeat motifs (Figure 2). In general, AT motifs are the 
most common motifs in noncoding regions of most 
plant genomes [45]. More variation was observed in the 
repeat motifs in the tetra-nucleotide repeats across the 
four species. Even closely related P. fruticosus and P. 
davidsonii had completely distinct tetra-nucleotide re- 
peat motifs (Figure 2). This is likely due, in part, to the 
rarity of the motifs and high number of possible nucleo- 
tide combinations. Several studies have found that the 
hypothetical origins of some SSRs are retrotransposition 
events [48,55,56] and, as such, may be useful in develop- 
ing part of a unique "fingerprint" for a given species. 

SNP analysis 

Using our SNP discovery parameters of an 8 x minimum 
coverage, and 30% representation of the minor allele, we 
identified an average of one SNP per 2,890 bp across the 
four species ranging from P. cyananthus (1/1,855 bp) to 
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Figure 2 Simple sequence repeat (SSR) motif distributions identified in each of the four Penstemon {P. cyananthus, P. dissectus, 
P. davidsonii, and P. fruticosus) sequences using the program MISA. 



P. fruticosus (1/3,777 bp). The three species with similar 
genome sizes all had similar SNP frequencies (Table 6). 
As reported in other plant species [57,58], we found that 
the frequencies of bp transitions (A<->G or C<->T) were 
more common compared to transversions (A<->T, A<->C, 
G<->C, G<->T) in Penstemon by an average factor of 1.5 



(Table 6). This is close to the 1.4 factor in Arabidopsis 
[35]. In the dual species assemblies, using the same 
parameters and a 90% SNP identity, the average transition 
to transversion mutation rate was lower at 1.2 (Table 6). 

In the dual species assembly, we found an average 
of 1 SNP/97 bp between homologous sequence assemblies 



Table 6 SNP type and distributions along with SNP comparisons of sequences found within and between species 
(homologous sequence comparisons) using SNP_Finder_Plus (8X min. coverage, 30% min. minor allele, 90% min. identity) 



Species assembly 


SNP 


Average 


SNPs/assembly 






SNP distribution 










coverage 


length 1 


A^C 


A^G 


A^T 


C^G 




G^T 


P. cyananthus 


2,493 


16.4 


0.000539 (-1/1855 bp) 


10.7% 


29.5% 


13.9% 


4.3% 


30.2% 


9.5% 


P. dissectus 


737 


14.3 


0.000280 (1/3568 bp) 


9.8% 


30.7% 


15.6% 


4.6% 


27.4% 


9.8% 


P. davidsonii 


713 


14.4 


0.000300 (-1/3333 bp) 


11.9% 


26.4% 


15.2% 


3.9% 


28.3% 


1 1 .8% 


P. fruticosus 


615 


12.4 


0.000265 (-1/3777 bp) 


11.7% 


27.2% 


1 7.9% 


4.2% 


25.4% 


12.0% 


Homologous sequence comparisons 


P. cyananthus x P. dissectus 


3,253 


10.6 


0.009610 (-1/104 bp) 


11.7% 


27.5% 


16.0% 


7.1% 


27.1% 


10.6% 


P. cyananthus x P. davidsonii 


1,958 


10.7 


0.008062 (-1/124 bp) 


11.1% 


27.6% 


15.8% 


7.1% 


28.5% 


9.9% 


P. cyananthus x P. fruticosus 


2,015 


10.6 


0.008367 (-1/119 bp) 


10.6% 


27.2% 


16.7% 


6.8% 


28.7% 


10.1% 


P. dissectus x P. davidsonii 


2,348 


10.8 


0.015605 (-1/64 bp) 


12.6% 


26.7% 


15.5% 


7.5% 


27.3% 


10.4% 


P. dissectus x P. fruticosus 


2,133 


10.0 


0.011991 (-1/83 bp) 


12.0% 


26.4% 


16.5% 


7.6% 


27.2% 


10.4% 


P. davidsonii x P. fruticosus 


2,156 


10.1 


0.008404 (-1/1 19 bp) 


12.8% 


28.2% 


14.5% 


7.2% 


27.2% 


10.1% 



1 Assembly length is bases shared between assemblies (see Table 4). 
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of any two of the four species. The frequency of SNPs be- 
tween homologous sequences of P. dissectus and P. 
davidsonii was the highest at 1/64 bp, with the lowest being 
between P. cyananthus and P. davidsonii at 1/119 bp. 
These results are in line with previous molecular based 
studies [5,14]. Penstemon davidsonii and P. fruticosus both 
belong to subgenus Dasanthera, while P. cyananthus and 
either P. davidsonii or P. fruticosus homologous sequences 
had fewer SNPs at 1/124 and 1/119, respectively. All 
homologous sequence comparison involving P. dissectus 
had the highest density of SNPs (Table 6) suggesting 
that P. dissectus is the most evolutionary distant of the 
four species. 

It is important, for a high degree of confidence in 
the results, when the "SNP identity" parameter in 
SNP_Finder_Plus to have two or more independent 
samples from the same species. This requirement was 
not met for each of the species assemblies, thus, introdu- 
cing a weakness in our interspecific SNP comparisons. 
Although with the parameters of a minimum 8x coverage 
and minor allele frequency set at least 30%, a putative 
SNP must be present in at least three of the eight contig 
reads, thus providing some protection from mislabeling a 
sequencing and/or assembly error as a SNP. Furthermore, 
when doing across species comparisons the average 
SNP coverage was actually 14.4 x (Table 6). Therefore, 
on average, five identical putative SNPs represented 
the minor allele. 

To understand the viability of our interspecific SNP 
as markers, we utilized the 1,958 P. davidsonii x P. 
cyananthus and 2,348 P. davidsonii* P. dissectus SNPs 
identified in the 14,254 and 10,053 respective homologous 
contig parings (Tables 4 and 6). After removing contigs 
absent of identifiable SNPs, putative repetitive elements, 
and nonnuclear plastid DNA, 431 remained. Of these 
contigs, 99 were homologous across all three species 
(P. cyananthus, P. davidsonii and P. dissectus) another 
164 were only in the P. davidsonii x P. cyananthus 
comparisons while the remaining 168 were in the 
P. davidsonii x P. dissectus contigs. Of those 431 
contigs, we selected the first 192 for SNP marker de- 
velopment, 86 from each of the species comparisons. 
These contigs were utilized for competitive allele- 
specific PCR SNP primer design using PrimerPicker 
(KBioscience Ltd., Hoddesdon, UK). 

Of the 192 SNP markers tested, using KASPar genotyping 
chemistry, 75 (39%) of produced consistent results for 
P. cyananthus , P. davidsonii, P. dissectus, and P. fruticosus 
(Table 7). All 75 SNP markers indicated polymorphisms 
between P. cyananthus, P. davidsonii, and P. dissectus, 
where only 16 (21% of the 75) produced results in P. 
fruticosus (Table 7). These results suggest that it is possible 
to develop intrageneric SNPs for Penstemon. However, it 
is unclear as to how viable these markers will be for use 



across all the species of the genome since only 21% 
worked on all the species used in this GR-RSC study. 

Repetitive elements 

We identified 28.5%, 16.8%, 17.4% and 16.1% of the 
respective sequence from P. cyananthus, P. dissectus, P. 
davidsonii, and P. fruticosus as repeat elements using 
RepeatModeler and RepeatMasker. Of these elements, 
3.0-7.8% were identified as LTR (long terminal repeat) 
retroelements, 0.3-1.0% transposons and the remainder 
were unclassified (Table 5). Since RepeatModeler utilizes 
RECON and RepeatScout to create a de novo model in 
RepeatMasker in place of the Arabidopsis model, details 
about the subcategories of LTRs and transposons which 
are included in the model could not be addressed. 
Maughan et al. [35] utilized GR-RSC on the Arabidopsis 
lines Ler-0 and Col-4. Utilizing RepeatModeler, then 
RepeatMasker on their sequence data from these lines, we 
found an average of 6.2% were identified as repetitive ele- 
ments, of which 4.4% were identified as LTR retroelements 
and 0.4% were transposons. By way of comparison, 
the downloaded full "non-genome reduced" sequence 
of Arabidopsis line TAIR10 had a similar 7.4% of the 
sequence identified as repeat elements of which 3.0% 
were LTR retroelements and 0.2% were transposons 
(Table 5 and Figures 3 and 4). These data suggest that the 
GR-RSC method reflects, at least for repetitive elements, 
similar proportions as to that found in the full sequence 
of Arabidopsis. 

Broderick et al. [5] hypothesized that the broad range 
found in Penstemon genome sizes, of the same ploidy, 
may be explained by retrotransposons. Lynch [60] 
detailed a relationship between genome size and repeat 
elements suggesting a linear relationship between the 
number of elements and genomes size [60-62]. The four 
Penstemon species used in this study provide insufficient 
evidence to establish a linear relationship between 
genome size and repeat elements in Penstemon. However, 
the three smaller, similar sized, Penstemon genomes 
possess comparable quantities of repetitive elements 
whereas P. cyananthus (the largest genome) has nearly 
double the number of repeat elements compared to the 
other three species (Figure 3). 

Not only do repetitive elements largely influence genome 
size, but they are also likely to evolve more rapidly than do 
low-copy sequence [62,63]. Thus, repetitive elements of a 
species take on unique "fingerprints" which become 
valuable in phylogenetic relationship studies [64,65]. 
Thus, our limited four Penstemon species genomic data 
set suggest agreement with the two hypotheses that firstly, 
repetitive elements are a major component of the genome 
size variation identified by Broderick et al. [5]. Secondly, 
these elements are variable between the species we 
tested suggesting the possibility of identifying species 



Table 7 Penstemon SNP marker name, GenBank dbSNP accession ID, polymorphism type, KASPar™ primer sequences (A1, A2 and common allele specific reverse) for 
all 75 functional SNP assays 



Name 



Contig 
Source 1 



SNP Allele SNP 
GenBank Type 
Accession # 2 



Allele Specific A1 
Forward (S'-^ 1 ) 3 

Allele Specific A2 
Forward (5'^3') 3 



Common Allele 
Specific Reverse 



P. P. P. P. P. P. qrananthus + P.dissectus + 

davidsonii qrananthus dissectus fruticosus pachyphyllus P. davidsonii P. davidsonii 



PenSNPOOOO! 00336CD 



JX649978 



A/G 



AAGATTGCA 
TGGAGAGGA 
AATGGA^ 

AGATOCAT 

GGAGAGGAA 

ATGGATC 



CGATGCAAA 
TGGCAGATC 
GGAGAAA 



PenSNP00002 00405CD 



JX649979 



C/T 



ACGCGAGTA 
ATAAGTOG 
^OTC 

GACGCGAGT 
AATAAGTO 

Gwcm 



CCAACAOT 
CCGCAGAAG 
CTCTOA 



PenSNP00003 02625CD 



JX649980 



A/T 



AAAAGCTCC 
CAAACATGA 
CTATGAACT 

AAAAGCTCC 
CAAACATGA 
CTATGAACA 



AATOTCGA 
CAC1TGAAGA 
GAGCGTAA 



PenSNP00004 02857CD 



JX649981 



A/C 



PenSNP00005 03943CD 



JX649982 



A/G 



ATCAAATGA 
AC1TGTCTC 
ATGAGCCT 

CAAATGAAC 
TOTCTCATG 
AGCCG 

ACTACCAAA 
ACTACCOT 

ccc™ 

ACTACCAAA 
ACTACCOTC 
COTG 



GCAACAAGGT 
GCAAAAAA^ 
GTAGCGTAA 



GGGGTACAGA 
GTOAGAAGA 
AGGAA 



PenSNP00006 04420CD 



JX649983 



A/C 



TGTCTCTAA 
ATCGATATG 
ATGAGGCT 

GTCTCTAAA 

TCGATATGAT 

GAGGCG 



GJGGJTCYTC 

CCCmAGA 

GGAOT 



Table 7 Penstemon SNP marker name, GenBank dbSNP accession ID, polymorphism type, KASPar™ primer sequences (A1, A2 and common allele specific reverse) for 
all 75 functional SNP assays (Continued) 



PenSNP00007 08446CD 



PenSNP00008 11303CD 



JX649984 



A/T 



JX649985 



C/G 



GGCAACATC 
CTCAGCAGA 
GACA 

GGCAACAT 
CCTCAGCA 
GAGACT 

GGGTGGTA 
TOGTOC 
^ATGGG 

GGGTGGTAT 

TGGTOOT 

TOTGGC 



CCGACTCCCT 
TAGCAATOT 
AGCAT 



CGGTATAAGA 
GCAACTAAGC 
TAAATGAOT 



PenSNP00009 11357CD 



JX649986 



C/T 



ACAATAmG 
ATAATOA^ 
CTCAAGTGCG 

CACAATAm 
GATAATCAT 
TCTCAAGTGCA 



AAGGATGCAG 
TGAGACAAAA 
GCTAAGAT 



PenSNP00010 11935CD 



JX649987 



A/C 



agcctga™ 
tccctoaac 

CCAA^ 

gcctga™t 
ccctoaac 

CCAATG 



GAATCACGG 
CGGGGGAG 
CAAAT 



PenSNP00011 12047CD 



JX649988 



C/T 



mGGCACT 
GCAGTGAC 
CATC 

C^GGCAC 
TGCAGTGAC 
CA^ 



TGCTCCAGT 
CCGAAGGA 
AGTOAAT 



PenSNP00012 12119CD 



PenSNP00013 12398CD 



JX649989 



A/G 



JX649990 



A/T 



AAGATAGAC 
GTGGTAmC 
TOAGCA 

AGATAGACG 
TGGTAmCT 
TCAGCG 

TA^COT 

tctgcaatc 
tcaacatoa 

attttcott 
ctgcaatct 
caacatot 



GCAATOG 
TCACAGAC 
CATAGTGG 



GTOAGTGTG 
A^AGAGT 
GCAmAG^ 



Table 7 Penstemon SNP marker name, GenBank dbSNP accession ID, polymorphism type, KASPar™ 


1 primer sequences (A1, A2 and common allele specific reverse) for 


all 75 functional SNP assays (Continued) 














PenSNPOOOH 13398CD JX649991 


A/C 


AGGCCTGTGG 


GGCATATCT 


X 


Y 


XXX H H 






CTGAOTGTCA 


TOCCCG^ 












GGGGTGTGG 

LLLL I L I LL 


TCCACAA 












CTGAOTGTCC 










V Cl I JlNIr UUU I J ID/JZLLV JAUt777/ 


A/C 


AAATGGTG 

AAA I LL I L 


GTCAACGG 

L 1 LAALLL 


Y 


x 


Y Y H H 






GGTGAI IMG 

LL I LA I I I I vJ 


ATTTGTGGA 












ACCATATGA 


AGTCGGTA 












ATGGTGGGT 

A I LL I LLL I 














LA 1 1 1 1 UAL 














CATATGC 










PpnSNPOOOIfi 143Q4CD iyaaqqq^ 


C/G 


Tr,AAAATTTG 

1 LIAAAA MIL 


AGAGTTGTAA 

/XU/XL 1 1 L 1 / x/x 


x 


Y 


X Y H 






AGATTTAATG 

ALA 1 1 1 AA 1 L 


CAAATTCCTT 

LAAA 1 1 LL 1 1 












AACAAACAGTC 


GGGTCCAAA 












r,AAAATTTrA 

LAAAA 1 1 1 LA 














GATTTAATGAA 

LA 1 1 1 AA 1 LAA 














CAAACAGTG 










ppn^MPnnni7 14661m iy^4QQQ4 

r cl lOINr UUU I / I ^hUU I LL^ 


A/G 


TGACCAAGGA 

1 LALLAALLA 


L 1 1 L 1 AL 1 L 1 L 


Y 


x 


Y H H 






ATGTGTTGAAG 

A 1 L 1 L 1 1 LAAL 


GGTGTTTGAGG 

LL 1 L 1 1 1 LALL 












AAOT 


TCTA 












GACCAAGGA 

LALLAALLA 














ATTTfTnTAA 

A 1 L 1 L 1 1 LAA 














GAACTC 










Ppn^MPDDDIR '\^11£CV) IY64QQQS 


G/T 
0/ i 


TAGGTGGAAT 

1 ALL 1 LLAA 1 


CTAAGTGA 

L 1 AAL 1 LA 


x 


Y 


X X H H 






1 L 1 LA 1 LLA 


CAAGCACA 

LA ALL ALA 












ACATOG 


AGGA 












GTTAGGTGGA 

L 1 1 ALL 1 LLA 














ATTGTGATGG 

A 1 1 L 1 LA 1 LL 














AACATOT 










ppn^MPnnniQ i74?irn iy^4qqq6 

r cl lOINr UUU I 7 I / *+Z I LU* JAU4777U 


G/T 
0/ i 


ATGGTGGTG 

A 1 LL 1 LL 1 L 


HAt^rrAA 

LALLLAA 


Y 


x 


Y Y X H 






GTTTGGATG 

L 1 1 1 LLA 1 L 


GGTGGAGT 

LL 1 LLAL 1 












AAAGC 


GCTOTAm 












CATGCTCC 














TCCmGC 














ATCAAAGA 










PenSNP00020 17816CD JX649997 


A/G 


AAGGACTG 


GCCAGGGTA 


X 


Y 


X X H H 






AGTACCAA 


CTGAACCTG 












GACAGATCT 


TC^A 









GGACTGAG 
TACCAAGA 
CAGATCC 



Table 7 Penstemon SNP marker name, GenBank dbSNP accession ID, polymorphism type, KASPar™ primer sequences (A1, A2 and common allele specific reverse) for 
all 75 functional SNP assays (Continued) 



PenSNP00021 18745CD 



JX649998 



C/T 



PenSNP00022 19267CD 



JX649999 



A/G 



AGCATATTG 
AAAAGATC 
AGTCGCATAG 

AAAGCATAT 
TGAAAAGAT 
CAGTCGCATAA 

AAATACCT 
GAGOTCT 
GCCTOTGT 

ACCTGAG 
C1TCTGCC 
TOTGC 



CAGCTGCTCC 

TATCCAATC 

TOGAA 



GATGCTCGT 
CATOTGCT 
CAACGAT 



PenSNP00023 21409CD 



JX650000 



C/G 



ACCATOAG 
GTAATAm 
CCAAAGGC 

ACCATOAG 
GTAATAm 
CCAAAGGG 



AGCGGTCT 

AGAACCGT 

CAATGOT 



PenSNP00024 22934CD 



JX650001 



A/G 



GTACAATOT 
CAAGTGTGTA 
^CTOCATA 

ACAATOTCA 
AGTGTGTA^ 
TCTOCATG 



GCACTGCAC 
CATOATGC 
CCTAAAA 



PenSNP00025 22942CD 



JX650002 



A/T 



ATCCGATCT 
TCGTCTACTA 
TGCCA 

ATCCGATOT 
TCGTCTACTA 
TGCCT 



AGAAAAGCA 
CAAGCTGAA 
ATCAGGGAA 



PenSNP00026 27992CD 



JX650003 



A/G 



PenSNP00027 01179DD 



JX650004 



A/G 



TCCTCCTCG 
TCTOTCCT 
CTT 

CCTCCTCG 
TCTOTCC 
TCTC 

TCGACCC 
CAACCTG 
TCACA 

OTCGACC 
CCAACCTG 
TCACG 



OTGGACCGT 
CCAAAGAAG 
GAAAGAA 



OTGOTGG^ 
TCGGAAAGAG 



Table 7 Penstemon SNP marker name, GenBank dbSNP accession ID, polymorphism type, KASPar™ 


1 primer sequences (A1, A2 and common allele specific reverse) for 


all 75 functional SNP assays (Continued) 














r cl lOINr UUUZO U I ADDYJYJ JAUJUUUj 


r/j 


I I VJAA I ^_ I I 


ctaccaaac 


x 


Y 


X X H H 






TGGTTTGAA 

I VJVD I I I vJAAAA 


TCACTCTAAC 

1 ^.AA^. 1 1 AAAA^. 












CmGTC 


ATCCGGAT 












CACTTGTGAT 

LnL HOI \Jr\ I 














MM GCnTTf 1 ! 














AACmG^ 










r ci loiNir uuuz? u i uui/uu jaujuuuu 


A/G 


TGGTCTTGTT 


hAAGTAGCTG 


x 


x 


Y X X X H 






fill ACCATT 

L 1 1 1 Aa^_^_Aa \ \ 


(Tatc^aaa 

v_v_aa 1 Uunnn 












ACGCAT 


AGGAAG^ 












uu I v — i ml I 














CTTTACC AT 
v_ 1 1 1 aav_v_aa | 














TACGCAC 










r Cl lOINr UUUJU U^UjI/UU jaujuuu/ 


A/G 


AGTAGTACA 

r\\J 1 r\\J 1 AA^.AA 


GTTGGGGGA 


x 


x 


Y X X H 






r,AATAfTTAA 

\JrAr\ 1 AAV_ 1 1 AAAA 


GTTGCCTTCT 












AACTATCACCA 


TGAAAT 












(TTAfTTArAfiA 

*0 1 Aa*0 1 nLnun 














ATACTTAAAA 

AA 1 AAV_ 1 1 AAAAAAAA 














CTATCACCG 










rcl lOINr UUUD 1 \JD DVJ'-rYJYJ JAUJUUUO 


r/T 


ac-1 1 1 1 1 rriTT 

AAO 1 1 1 1 1 1 1 


AAGGCTTAGC 

nnuuL 1 1 AaOv_ 


Y 


Y 


X Y Y Y 






TGTCCTTATG 

1 VJ 1 V_V_ 1 1 AA 1 *J 


TTGGATGATA 

1 1 \J\Jr\ 1 \Jr\ 1 AA 












TGCAG 


TCCTACAA 












( AG 1 1 1 l( ( 1 














TTTGTrrTTA 

1 1 1 *0 1 1 1 / A 














TGTGCAA 










ppn^MPnnn^? osRR4nn ly^snnnQ 


A/T 


gtcaccgcc 


CGGC ITTTGA 

^.^J^^. 1 1 1 1 \Jr\ 


x 


Y 


X X H H 






JCCGATTGA 

1 LLUr\ 1 1 \3r\ 


CGCCGCCGJ 












GATT 


AAA 












GTCACCGCC 














TGGGATTGA 














GATA 










rcl lOINr UUUDD \J\jyJ\jYJYJ JAUJUU 1 U 


G/T 


GTTGATTfTA 

\3 \ \ \Jr\ 1 1 v_ 1 AA 


TAfTArAAA 

1 Aav_ 1 ALnnn 


x 


Y 


X X H H 






r AGATCTTAA 

V^rA\Jr\ 1 V_ 1 1 AAAA 


GGGTAAAAAG 

OVDO 1 AAAAAAAAAAVJ 












TTCTTGATTG 


TGCAATOATA 












AGTTGATTfTA 

r\\3 \ \ \Jr\ 1 1 v_ 1 AA 














<AGATCTTAA 

V-AAOAA 1 V_ 1 1 AAAA 
























PenSNP00034 08307DD JX65001 1 


C/G 


ACATOAGG 


GCGCAATOA 


Y 


Y 


X Y H 






GTCCACCAA 


AATCTCTOAA 












AAATCCG 


TCACCTGGT 












ACATOAGG 














GTCCACCAA 














AAATCCC 











Table 7 Penstemon SNP marker name, GenBank dbSNP accession ID, polymorphism type, KASPar™ primer sequences (A1, A2 and common allele specific reverse) for 
all 75 functional SNP assays (Continued) 



PenSNP00035 08352DD 



JX650012 



A/T 



PenSNP00036 



JX650013 



A/T 



AGTACAAGGA 
AAAACC^A 
TOGTAAGTATA 

AGTACAAGGA 

AAAACC^AT 

TAGTAAGTA^ 

GTGTOGAG 
AGCCAGGT 
GCGA 

GTGTOGAG 
AGCCAGGTG 
CGT 



CTGACACAA X 

ACCCATOTA 

ATATGACCAA 



GTATOAGGAT Y 

CATOTGACAA 

AAAACATA 



PenSNP00037 08608DD 



JX650014 



C/T 



GTAGATAAG 
TOATOCGA 
GAGGC 

GGTAGATAA 
GTOATOC 
GAGAGGT 



CCAAACAAAT X 

GCACCACA^ 

CTCOT 



PenSNP00038 08831 DD 



JX650015 



A/T 



mGAACTGC 
CATGTAAAGT 
TG^AGA 

TOAACTGCC 
ATGTAAAG^ 
G^AGT 



A^GAACCA 
AGGAGCTATC 
AGAGG 



PenSNP00039 08947DD 



JX650016 



A/T 



GGGATCGTAA 
AACTCAGGAA 
AAATGA 

GGGATCGTAA 
AACTCAGGAA 
AAATGT 



TCAGATACTC 
GTGGGGTOT 
CGA^ 



PenSNP00040 08959DD 



JX650017 



A/G 



PenSNP00041 09272DD 



JX650018 



A/T 



AGAGAATGAAG 
AAGGAGAAGGA 
AGAAA 

GAGAATGAAGA 

AGGAGAAGGA 

AGAAG 

TCTACAAAAC 
AATCAGCAGTC 
ATCA^ 

TCTACAAAAC 
AATCAGCAGT 
CATCATA 



CTCCTACGG 
TOCATOTC 
GGTAGTA 



TCGACACOT 
TOCCTOTC 
TTGAA 



Table 7 Penstemon SNP marker name, GenBank dbSNP accession ID, polymorphism type, KASPar™ primer sequences (A1, A2 and common allele specific reverse) for 
all 75 functional SNP assays (Continued) 



PenSNP00042 09369DD JX650019 Cf\ G^mATACG GGTOACTCT Y X Y Y H X 

CATCCATATAC CCAGAAATAA 
ATAATAATAG AATCTOTAT 



G^mATACGC 
ATCCATATACAT 
AATAATAA 



PenSNP00043 09764DD JX650020 A/G AATOAACGTC TOACTATAC Y X Y Y H H 

AAATOCAAG CGGCTGAGT 
GTOCA TGGCAT 



CAACGTCAAA^ 
GCAAGGTOCG 



PenSNP00044 10765DD JX650021 A/G I I I I I I AATAAAT AAATOAGT X Y X Y X H H 

ATCCTGGTGGAT GGATGGCTA 
AAmAT GGAAGACTAA 



^mAATAAAT 
ATCCTGGTGGA 
TAAmAC 



PenSNP00045 10870DD JX650022 AfT AGATCTGGAG 

ACTAAAT 



AGATCTGGAG 
ACTAAAA 



PenSNP00046 11107DD JX650023 Cf\ GTCCGACGTG 

ACAATGCAGC 

CTGTCCGACGT 
GACAATGCAGT 



PenSNP00047 11531DD JX650024 Cf\ AGAAGATO^ 

CGGCTGGGAGC 



AAGAAGATO^ 
CGGCTGGGAGT 



PenSNP00048 11655DD JX650025 A/G ACGTCCATGGA 

GGACCATAAA 



CTACGTCCATGG 
AGGACCATAAG 



PenSNP00049 11974DD JX650026 GfT AAAATGCATGTA CACACCCCC Y X Y Y H H 

GmGGmACG AAAGGAAG 

AAAATGCATGTA AATAGCAT 
GmGGmACT 



CGAAGAG^ X Y X X Y Y 

TGGGTGGGC 

GGAT 



CGCCGTCAA Y X Y Y H H 

AGAGACm 

GTOGAT 



TCTTCACATG X Y X X H H 

ATOCGACAA 

TGGCTGAAT 



GCTGTOTCC X Y X X H H 

TGCAAGGAA 

CYTCYT 



Table 7 Penstemon SNP marker name, GenBank dbSNP accession ID, polymorphism type, KASPar™ primer sequences (A1, A2 and common allele specific reverse) for 
all 75 functional SNP assays (Continued) 



PenSNP00050 13159DD 



JX650027 



C/T 



PenSNP00051 13463DD 



JX650028 



G/T 



TGAATGTACm 

TCATOATAGA 

GAACG 

GTOAATGTAC 
^CATOATA 
GAGAACA 

GCCmGACG 
GCGAAGGAT 
TTC 

CGCCmGA 
CGGCGAAG 
GAmA 



AACAATAGT Y 

ACAACACAAC 

TAAAGCAGAGA 



GCAAGCACGG X 
CACTAAGCCOT 



PenSNP00052 14334DD 



JX650029 



A/G 



AGAAACAAC 
AAATACGAA 
TAAATCACCCA 

GAAACAACA 
AATACGAATA 
AATCACCCG 



TOGAAAATO X 

TGOTGAATCA 

CGCAGT 



PenSNP00053 00290DD03373CD JX650030 



C/T 



TGCCmGCG 
TCGCCACAATC 

CITGCC^TG 
CGTCGCCAC 
AA^ 



AGCTAAGAGA 
TGGGCAGACT 
TOCAAAAT 



PenSNP00054 00354DD04637CD JX650031 



A/G 



GCAAAAGG 
GAACCCTCA 
mCG^ 

CAAAAGGGA 
ACCCTCAm 
CGTC 



TAOTGTCTGG 
GAC^COT 
TCTCm 



PenSNP00055 01 1 61 DD1 1 697CD JX650032 



A/G 



ACTGGTAAA 
TACACTACG 
TOACAGT 

CTGGTAAA 
TACACTAC 
GTOACAGC 



GAAACACAGCA Y 

GCCCAACGACA 

TAT 



PenSNP00056 01 323DD1 5501 CD JX650033 



A/G 



ACCTGAAGA 
AmGTOAC 
TAOTCGT 

CCTGAAGAA 
mGTOACT 
AC1TCGC 



GGATCGGGTGGA X 
ACGAmGTG^ 



Table 7 Penstemon SNP marker name, GenBank dbSNP accession ID, polymorphism type, KASPar™ 


1 primer sequences (A1, A2 and common allele specific reverse) for 


all 75 functional SNP assays (Continued) 














r Cl lOINr UUUJ / U I J4 I UUUZ'-fO I v_L> JAUJUUj^ 


A/G 


AATTA^AAr 
MM I I MvJMMv_ 


GGAGCCC AAA 


Y 


x 


Y Y H H 






CkCklCCkC 


CC 1 1 1 1 AGATT 

1 1 1 1 Mv_M 1 1 












TGATOCAA 


C^CTA 












AGAACC AC A 














TfTACTGAT 














TCCAG 










ppn^MPnnnsR n?ni qddd^i ?7rn lYfisnrns 


A/G 


GTGATTGTTA 

vJ I \Jr\ I I U I I M 


GTAGGAGGGT 


x 


Y 


X X H H 






AATfTCAATA 

MM I v_ I VJMM I M 


Trr^AAAAAr.A 












TATAAI I ICI I I I 


CCAGAT 












GTGATTGTTA 

vJ I un I iui I M 














AATfTCAATA 

MM I v_ I VJMM I M 














TATAAmcmc 










r cl lOlNrUUUjy UZOJ I VJVJ I / I y I ^.U JAOJUUDO 


A/C 

A/ O 


AA^A^fvrTCA 

nnunuu I I un 


r,AAr,AAAATr 


x 


Y 


X X Y H 






TGGTAAGTTA 


ATTGTGGAG 












TCGAGA 


ATCTCGTGTA 












Af^A^fTTTCAT 

nunuu 1 1 VJM 1 














rrTAAGTTAT 














CGAGG 










r cl lOINr UUUUU UjUO^UU I 'H-/ UjLU JAOJUUj/ 


r/T 

w I 


1 1 1 v_MVJMvJ 1 v_ 


C-,C ATTTCTTG 

VJv_M 1 1 1 v_ 1 lu 


x 


x 


Y Y X X H 






ACTAATGTTC 


I v_v_r\ I v 1 v I I 












TCACG 


CAAGATGTA 












(-.INK A(nA(n 














1 v_Mv_ 1 MM 1 1 














TCTCACA 










ppn^MPnnn^i n^4?^nn?sRQ7rn ly^snn^R 


A/r 

M/ ^ 


AATTGTTGTA 

mm i i ^ — i iLin 


TATTGTTAGA 


Y 


x 


Y Y H H 






V_VJ 1 V_v_M 1 1 1 O 


CATGGACAT 

V_M 1 vJvJMv_M 1 












ATCGGAT 


GGAAATOAGA 












CTTCTACGT 

v_ 1 1 v — 1 r\v_VJ 1 














GGATTTGAT 

V.v_M 1 1 1 \Jr\ 1 














CGGAG 










ppnSNPonnfi? 04a^?ddi 91 ragd ixfisoo^Q 


A/T 


AAATGGGT 
/\/\/ \ i vjvjvj i 


GTGTTGTTTAG 

LILI ILI 1 1 MV_ 


Y 


Y 


X Y H 






GAGGTGAA 


TGTd 1 1 1 1 1 i c — r 












AmCCGCA 


TCI 1 1 1 1 












AAATGGGT 














CAGCTGAA 














AmCCGCT 










PenSNP00063 05 1 60DD08243CD JX650040 


C/G 


TCGATCGTO 


GATCCCATA 


Y 


X 


Y Y Y H H 






AAATGATAAT 


GACI ICI 1 1 1 












TGATACAAG 


AAGGATOTAA 









CGATCGTO 
AAATGATAA 
TOATACAAC 



Table 7 Penstemon SNP marker name, GenBank dbSNP accession ID, polymorphism type, KASPar™ 


1 primer sequences (A1, A2 and common allele specific reverse) for 


all 75 functional SNP assays (Continued) 












rcl lOINr UUUU^ UUjjZL'UUjuZ/ LU JAUJUU^- I 


A/r 

/A/ 


ATrAAATr,rr 


AC ATTTCCTAC 

A^_A 1 1 1 1 


X X 


Y X X H 






ATA^ATrfTC 

A I nUn I I 


A CC A A CTTCTT 

A^-^-AA^. 1 1 L 1 1 










CAGAm 


CCACTA 










rAAATr.rrA 












I r\\Jr\ I I VJ 












CAGATO 








r ci i jiNir uuuuj uo/ touu i jujulu jaujuu^-z 


C/G 


AGCTGTTC 


CACCAJGTGA 


X Y 


X X H H 








ACCAACACT 










TCATGAATG 


ATOTCAm 










AGCTGTOA 












GGAGGTOA 












TGAATC 








PenSNP00066 09773DD1 4323CD JX650043 


A/G 


TCATGCCCA 


CCTGGTATGA 


Y Y 


X Y Y 






TOCCCA 


ACATGGGGA 










CATGCCCA 

v_A I dv_v_v^r\ 


GGTOT 










TOCCCG 








Ppn^MPnnn^7 in?4Rnnn6i snrn lYfisnrvtd 

rcl lOINr UUUU/ I UZ^-OL>L>UU I JULU JAUJUU^ 


C/G 


I VJ I VJ | V._A I I 


(TmTATATfT 

*J 1 1 1 ^_A 1 AA 1 v ! 


Y Y 


X Y Y Y H 






HAAATfAA 

Unnn I ^.AA 


CCCTTTGAGC 










TCCGC 


TTCTTGAA 










UL I 1 U 1 O 1 L 












ATTCAAAT 

A 1 1 unnn 1 












CAATCCGG 








Ppn^MPnnrifiR 1 nfi?4nni 1 ^sRrn lYfisorvis 

TCI UlNrUUUUO I \j\J/J-^YJYJ I I JJOv-L 1 JAUJUUt"J 


A/G 


GTGGCAGJ 


(-> 1 1 1 1 1 ( ( ( 1 


Y X 


Y Y H H 






GTGAAACT 

1 Unnnl. 1 












GCATCA 


AGGTOAT 










GTGGCAGT 












CTGAAAfT 












GCATCG 








ppn^MPnnn^Q 1 1 ?^7nnn^?7^rn lY^snrw, 

rcl lOlNrUUUUt/ I I zu/ UUUUZ / JLU JAUjUU^U 


A/r 


ArrAAATA 

nLLnnn 1 A 


HArTCAAf^, 


Y Y 


X Y Y H 






(TTATTAGr 


VJA 1 VJ 1 1 VJV_ 










TCCAGTCGAA 


GAGAGGC 










CCAAATAC 












™™gct 












CCAGTCGAC 








PenSNP00070 1 1 564DD1 71 28CD JX650047 


C7T 


TGGAC1TG 


ATATGAAA 


Y X 


Y X H H 






GCATOAA 


CTCCCCAC 










ACAAAAGATC 


AAGAAA 







AATOGAC 

TOGCATOA 

AACAAAAGA^ 



Table 7 Penstemon SNP marker name, GenBank dbSNP accession ID, polymorphism type, KASPar™ primer sequences (A1, A2 and common allele specific reverse) for 
all 75 functional SNP assays (Continued) 



PenSNP00071 1 1 647DD1 7264CD JX650048 



C/G 



PenSNP00072 1 1 671 DD1 71 44CD JX650049 



or 



GCACGAGCC 

AAAATCCT 

GAGC 

GCACGAGCC 

AAAATCCT 

GAGG 

GTGCAGCA 
ACCCCTA^ 
CATGAC 

ATGTGCAG 
CAACCCCT 
ATOATGAT 



ATOGCATG 

TGTATCCT 

GTGTGGGA 



CCTGTCCAA X 

AACATATGAT 

OTCATOGAA 



PenSNP00073 1 291 5DD1 7470CD JX650050 



A/G AAGAAAAG 
GGTGGACAA 
ATOAACCGT 



CAGAACAAC 
ATCATAOTG 
ATAAATCTOT 



GAAAAGGGT 
GGACAAA^ 
AAACCGC 



PenSNP00074 1 3828DD1 4937CD JX65005 1 



CfT GTAAGATAT 
GCTGCCAGA 
TGG 



CTCTGAAGAA 
G^^GTCCT 
TGATAGCTA 



GTAAGATAT 

GCTGCCAG 

ATGA 



PenSNP00075 14286DD18608CD JX650052 GfT GTATOAG CCAOTGAAT Y X Y Y X H 

AGCCACT TGmGAAGA 
ACCGG GmGGGAA 

CTGTATOA 

GAGCCAC 

TACCGT 

^hese contigs have been deposited at DDBJ/EMBL/GenBank as a Whole Genome Shotgun project under the accessions AKKG00000000 (P. cyananthus), AKKH00000000 (P. dissectus), AKKI00000000 (P. davidsonii), and 
AKKJ00000000 (P. fruticosus). The version described in this paper is the first version for each accession, XXXX01 000000. 
2 The GenBank accession identification for the full sequence for each allele with the specific SNP bp identified. 

3 KASPar™ primers: A1 and A2 primers are SNP allele specific. All A1 Forward primers had the follow universal primer GAAGGTGACCAAGTTCATGCT added to the 5' end of the allele specific sequence. All A2 Forward 
primers had the follow universal primer GAAGGTCGGAGTCAACGGATT added to end of the 5' allele specific sequence. 
4 H = heterozygous compared to either homozygous condition for either "X" or "Y". 
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Repeat Masker Results 
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Figure 3 Percentage of retroelements, DNA transposons and other unclassified repeats in Penstemon cyananthus, P. dissectus, 

P. davidsonii, P. fruticosus, and both genome reduced and non-genome reduced Arabidopsis^ . 1 Genome reduced A. thaliana sequence 

from Maughan et al. [35]; A thaliana RILs Ler-0 and Col-4; Non-genome reduced A thaliana sequence downloaded from TAIR 

(The Arabidopsis Information Resource) as whole chromosomes; the diploid (2n = 2x=16) genome size as reported by Broderick et al. 

and Schmuths et al. [5,59]. 



specific repetitive elements. However, without further 
comparisons we were unable to identify specific repetitive 
elements associated with the four Penstemon species used 
in this study. 

Gene ontology 

Using BLASTX we identified an average of 21.5% of 
the contigs across the four species as putative genes 
with an average of 13.9% annotated by Blast2GO 
(Table 5). These putative genes were compared and 
contrasted in a more detailed study by Dockter [23]. 
Furthermore, he compared the Penstemon sequences 



to known genes from the related genera Antirrhinum 
and Mimulus, and identified nine putative Penstemon genes 
from Antirrhinum and 14 from Mimulus with an e-value 
below l.Oe" 13 . Three genes (NADH dehydrogenase from M. 
aurantiacus, ribosomal protein L10 from M. guttatus, 
and ribosomal protein subunit 2 from M. aurantiacus, 
M. szechuanensis, and M. tenellus var. tenellus) were 
perfect hits (e-value = 0.0). 

Conclusions 

Penstemon are recognized for their phenotypic vari- 
ation and their adaptation to multiple environments 
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Figure 4 Relationship between genome size and repeat elements in Penstemon including the relationship of both LTRs and total 
repeat elements to genome size for both genome reduced Penstemon and genome reduced/non-genome reduced Arabidopsis* 
(yellow). 1 Genome reduced A thaliana sequence from Maughan et al. [35]; A. thaliana RILs Ler-0 and Col-4; Non-genome reduced A. thaliana 
sequence downloaded from TAIR (The Arabidopsis Information Resource) as whole chromosomes; Genome size as reported by Broderick et al. 
and Schmuths et al. [5,59]. 
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[6-8,13,14,17,30,31]. Broderick et al [5] found that 
this diversity is reflected by a wide range in their genome 
sizes. Nevertheless, even with this demonstrated plasticity 
we have identified evidence that there is a high level of 
sequence conservation across the genus. This apparent 
sequence conservation is in harmony with the hypothesis 
that Penstemon has rapidly irradiated to its variety of 
species rather recently in evolutionary time [13,14]. 
Furthermore, our study identified evidence that the 
genome size variation in Penstemon is rooted in the 
amount of repetitive elements in each species. 

Despite the large differences in Penstemon's genome 
size, the finding that the genus has a great deal of 
sequence conservation is invaluable for the development 
of interspecific markers. The further development and 
mapping of a number of conserved markers will facilitate 
the domestication of xeric Penstemon cultivars via 
interspecific hybridization which are largely unexploited 
largely due to crossing barriers [6-8,10-12,15]. Viehmeyer 
[16] hypothesized that it might be possible to develop 
Penstemon breeding lines that would facilitate the 
indirect interspecific hybridization of any two species 
within the genus. He and others have used traditional 
breeding techniques to develop a number of interspecific 
hybrids [7,11,15,17,66]. Clarifying the phylogenetic 
relationships within the genus should facilitate these 
objectives [67]. In the largest Penstemon phylogenetic 
study conducted to date, Wolfe et al. [14] sequenced 
the ITS and two chloroplast genes in 163 species. 
They concluded that many species are polyphyletic in 
their origins thus making them difficult to discriminate 
between one another; thus, requiring additional molecular 
studies to more accurately define taxonomic relationships. 

We tested 51 SSR/INDEL based markers (Table 3), 
and identified several thousand inter- and intraspecific 
SNPs (Table 6), all of which have potential as both 
inter- and intraspecific markers. Of the 51 SSRs/ 
INDELs we selected 12 to test across 93 Penstemon 
taxa. The resulting data was used to more clearly 
define the phylogenetic relationships of those taxa but 
our results were incoherent. It is possible that some 
of these markers may represent more than one locus 
in the Penstemon genome. This situation has been 
identified by others as a potential weakness in using 
SSR based markers in interspecific phylogenetic studies 
[46,47]. A major reason for the vagary in Penstemon s 
phylogeny is that it appears to have quite recently evolved 
and rapidly radiated leaving weak species boundaries 
[13,14]. Furthermore, there are a number of reports 
of speciation via natural interspecific hybridization 
found within the genus [14,68-73]. Therefore, like 
Wolfe et al. [14], we concluded that better marker 
data sets will be required to reduce present phylogenetic 
ambiguity. 



To gain clearer insights into the relationships of 
Penstemon it will take carefully designed large scale 
sequencing studies. There are methods which are 
showing promise to do such studies economically. 
One example would be to utilize GR-RSC or similar 
methods which will sample large quantities of homolo- 
gous sequence of a genome at ever decreasing costs 
[18,20,74], Since our SSR/INDEL, sequence, and SNP 
data have demonstrated broad applicability across 
Penstemon it becomes evident that further studies 
utilizing this same GR-RSC protocol and downstream 
analysis on additional species would allow broader 
comparisons of putative genes, repeat elements, SNPs 
and SSRs, facilitating a much better understanding of 
the genus. Furthermore, using this technique on carefully 
selected parents and their segregating progeny would 
allow Penstemon genetic mapping studies which 
would greatly enhance the ability to do breeding and 
domestication studies within the genus. Historically, 
studies of this nature would have been unthinkable; 
however, mass homologous loci sequence studies are 
rapidly becoming feasible [18,20,74]. In the interim it 
is possible to take the data we report here and further test 
the 75 SNPs we have reported here along with others 
not yet developed and for around US$0.05/data point 
[18,20] do a much broader study. Studies on homolo- 
gous SNPs across many Penstemon taxa, similar to 
the Amaranthus study of Maughan et al. [20], should 
assist in developing improved insights into Penstemon 
phylogenetic relationships and produce high quality 
genetic maps from carefully designed segregating 
Penstemon populations. 
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